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ABSTRACT 


Although the proportion of “functional” DNA in eukaryotic genomes is both debatable and subject to definition, 
most sequences gathered for phylogenetic purposes are indisputably functional. For example, patterns of variation 
are likely to be strongly constrained in ribosomal RNAs because of their structural and catalytic roles in protein 
translation, and in protein-coding genes, because of protein function itself. Although seemingly obvious, these concerns 
are usually ignored by workers producing gene trees. We have examined the extent of functional constraints in — 
plant rbcL sequences. Not only do rbcL sequences appear to change with essentially clocklike regularity, but nucleo e 
based cladograms imply that approximately 97.5% of codon changes on internal branches are functionally — 
(i.e., synonymous or functionally labile). From this perspective, rbcL evolution appears to be strongly constraint 2 
function, Transforming nucleotide data into ad hoc string recognitions alters the size of the unit character sufficiently 
to highlight “blocks” of conservative information that may or may not be functionally constrained, Simultaneous 
cladistic analysis of all available evidence will highlight the proportion of congruent information, despite diverse 
functional constraints among the characters analyzed. We demonstrate the strength of this approach using different 
forms of the same rbcL evidence (i.e., nucleotides, strings, or amino acids) in combination with the seed-plant data 
of Nixon et al. 
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Diversification of the major clades of extant land 
plants probably dates from the Silurian to Creta- 
ceous. During the Silurian—Devonian, liverworts, 
hornworts, mosses, and tracheophytes formed dis- 
tinct lineages. Differentiation of the tracheophyte 
clades, notably angiosperms and other seed plants, 
began by the Devonian. The estimation of land- 
plant phylogeny, a research goal spanning over 
400 million years of cladogenesis and extinction, 
is no simple task. For example, many groups lack 
strong morphological similarities that might suggest 
patterns of relationship. 

Recent years have seen an explosion of interest 
in molecular information, with its promise of easily 
interpreted similarities for bridging otherwise large 


phenotypic gaps. In particular, the plastid rbcL 
gene (which encodes the large subunit of RuBisC0: 
ribulose-1,5-bisphosphate carboxylase/oxygenase. 
a primary enzyme in carbon fixation) has been 
sequenced extensively, with primary emphasis 00 
the angiosperms (Clegg, 1993; Chase et al., 1 993). 
Arguing from expected synonymous substitutions 
per site under a particular rate assumption, Gt 
(1993) suggested that rbcL sequences should 2 
phylogenetically informative for the time interv 
400-100 million years before present. We argue 
here that this and similar assertions are incomplete. 
From direct estimation of total ULT. 
optimized on cladograms; see Albert et al., ne 
1993; Albert & Mishler, 1992 Albert et al., l 
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we will demonstrate that divergence-time asym- 
metries among taxa restrict rbcL-based hypotheses 
of land-plant phylogeny far more than do rate 
asymmetries. 

We have examined the internal stability of land- 
plant rbcL evidence through conversion of nucle- 
otide information into different data forms, includ- 
ing presence/absence of ad hoc nucleotide strings. 
Cladograms produced from nucleotide, string, and 
translated amino acid data are only partially con- 
gruent. Character optimization on both nucleotide 
and string trees reveals extensive functional con- 
servation through the predominance of silent 
changes and labile (function-conserving) amino acid 
replacements. Hence, rbcL nucleotides are no less 
functionally constrained than morphological char- 
acters (contra Olmstead, 1989; Sytsma et al., 1991; 
Clegg, 1993). 

Although the separation of protein-functional 
from cladogenetic history may not be entirely pos- 
sible, the extent to which functional history reflects 
phylogeny might be assessed through congruence 
studies with characters expected to carry diverse 
patterns of functional constraints. As such, we have 
performed total-evidence analyses at the seed-plant 
level using, as a “constant,” a new matrix of pri- 
marily morphological data (Nixon et al., 1994, this 
issue). It emerges that combination of rbcL nucle- 
otide, amino acid, or string data with this matrix 
produces highly compatible cladistic hypotheses. 
These studies point to (i) the commonality of in- 
lormation in different data forms representing the 
same evidence, and (ii) the power of simultaneous 
evaluation of all available evidence and weakness 
of further production of rbcL gene trees (cf. Kluge, 
1989: Barrett et al., 1991; Donoghue & Sander- 
son, 1992: Jones et al., 1993; Mishler, 1994). 


THE RATE “PROBLEM” 


As has been pointed out in several recent papers, 
sequence change in the rbcL gene is not strictly 
— (Albert et al., 1992a; Bousquet et al., 

992; Gaut et al., 1992; Clegg, 1993). Here, we 
Provide a number of new comparisons (Table 1) 

šed on patristic distances between woody taxon 
SCH from Search II of Chase et al. (1993). It is 
Clear that our own estimates and those of other 
workers all fall within a very narrow range of 
absolutely low values. The mean rate per taxon 
Se here is approximately 2 x 10-!° 
del Substitutions per site per million years; Wen- 
el & Albert (1992) estimated 5-7 x 10-" for 
aree herbaceous-pair comparisons. Lineage-spe- 
cific rate differences were found by Bousquet et 
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al. (1992) and in the relative-rate tests of Gaut et 
al. (1992), but absolute rate estimates do not differ 
substantially from our own findings. Thus, whereas 
rbcL data cannot be considered perfectly ultra- 
metric (i.e., satisfying a clock assumption), the 
small range of absolute variation suggests that some 
predictions of the clock hypothesis still apply. For 
example, the relationship between time and the 
accumulation of nucleotide substitutions may be 
nearly linear. We term this condition, apparently 
characterizing rbcL sequence data, “‘quasi-ultra- 
metric." 

Quasi-ultrametricity has several important im- 
plications. One is that the extent of sequence di- 
vergence in a given taxon sampling should roughly 
reflect the timing of underlying cladogenetic events. 
If all such events are ancient, extensive sequence 
differences among all taxa are to be expected (Fig. 
1; cf. Donoghue & Sanderson, 1992, fig. 15.3). 
If some cladogenetic events are ancient whereas 
others are much more recent, expected sequence 
divergence in a data set would be prominently 
skewed (Fig. 2). As these properties become ex- 
treme, parsimony analysis will be hampered by the 
increased probability of parallel changes among 
either anciently diverged or divergence-time-asym- 
metric sequences (Figs. 1, 2; cf. Donoghue & 
Sanderson, 1992: 347-349). Given that A, T, G, 
and C are the only character-state alternatives, 
either scenario is likely to produce patterns of 
similarity that may be nonhomologous and there- 
fore cladograms that are ahistorical. This is pre- 
cisely the “long branches attract” issue raised by 
Felsenstein (1978) and others. 

Although asymmetrical rates of sequence change 
are often invoked to explain branch attraction be- 
havior (see Clegg and Zurawski, 1992: 10, with 
reference to rbcL), the problem is better defined 
in terms of both rate and divergence time as their 
product, per-character change: the À of Albert et 
al. (1992a, 1993; Albert & Mishler, 1992; cf. 
Hendy & Penny, 1989). With quasi-ultrametric 
data, rate asymmetry is unimportant in this regard; 
time through which a branch exists becomes the 
central factor. As such, our expectation of the 
performance of parsimony analysis on rbcL data 
must include our ability to estimate both the ab- 
solute and relative timing of cladogenetic events 
inherent to particular data matrices. Of course, 
this may not always be possible. 

An additional implication of quasi-ultrametricity 
is the near satisfaction of selective neutrality. A 
molecular clock is predicted by the neutral theory 
of molecular evolution; equal rates of mutation and 
fixation are the expectation (see Kimura, 1983; 
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TABLE 1. “Phylogenetic” estimation of total substitution rate for 19 woody-taxon pairs. The rate of sequence 
divergence was calculated as per-site divergence (the patristic distance, D,, divided by the number of nucleotides 
compared) divided by time since cladogenesis (Albert et al., 1992a). Average rates for individual taxa are half of the 
values shown. Data are from Search II of Chase et al. (1993); systematic error associated with that analysis can be 
expected to affect all calculations equally. Divergence time assumptions are based upon geologic dates associated with 
vicariant disjunctions (with the exception of all Arecaceae comparisons, which follow from the arguments of Wilson 


et al., 1990). 








Divergence Divergence rate 
time (subst. /site- 
Taxon pair Area assumption D. taxon pair) 
Callitris rhomboidea R. Br. ex Rich. Australia 100 My* 55 3.85 x 107" 
Widdringtonia cedarbergensis Marsh ` Africa 
(Cupressaceae) 
Metasequoia glyptostroboides Hu & Asia 40 My" 16 2.80 x 107" 
W. C. Chang 
Sequoiadendron giganteum (Lindl.) N. America 
J. Buchholz 
(Taxodiaceae) 
Illicium parviflorum Michx. ex Vent N. America/Asia 200 My: 54 1.89 x 10°" 
dustrobaileya scandens C. T. White Australia 
(Illiciaceae ` Austrobaileyaceae) 
Drimys winteri J. R. & G. Forst. S. America 100 My 21 1.47 x 107^ 
Belliolum sp. New Caledonia 
(Winteraceae) 
Drimys winteri J. R. & G. Forst. S. America 100 My 14 0.98 x 10°" 
Tasmannia insipida DC. Tasmania 
(Winteraceae) 
Canella winteriana (L.) Gaertn. N. America 200 My 78 2.73 x 107" 
Belliolum sp. New Caledonia 
(Canellaceae, Winteraceae) 
(Canella winteriana (L.) Gaertn. N. America 200 My 67 2.35 x 107" 
Tasmannia insipida DC. Tasmania 
(Canellaceae, Winteraceae) 
Liriodendron tulipifera L. N. America 40 My 10 1.75 x 10°" 
Liriodendron chinense (Hernsl.) Sarg. Asia 
(Magnoliaceae) 
Calycanthus chinensis Cheng & Asia/N. America 200 My 28 0,98 x 1077 
S. T. Chang > 
Idiospermum australiense (Diels) Australia 
S. T. Blake 
(Calycanthaceae Idiospermaceae) 
Chimonanthus praecox (L.) Link Asia 200 My 24 0.84 x 107" 
Idiospermum australiense (Diels) Australia 
5. T. Blake 
(Calycanthaceae Idiospermaceae) 
Chamaedorea costaricana erst. Americas 60 My: 15 1.75 x 10" 
Drymophloeus subdistichus S. Pacific 
(H. E. Moore) H. E. Moore 
(Arecaceae) 
Chamaedorea costaricana Oerst. Americas 60 My 20 2.33 x 107^ 
Vypa fruticans Wurb. 5. Pacific/ India 
(Arecaceae) 
Serenoa repens (Bartram) Small Americas 60 My 18 3.10 x 107^ 


Drymophloeus subdistichus 
(H. E. Moore) H. E. Moore 


(Arecaceae) 


S. Pacific 
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Divergence Divergence rate 
, time (subst. /site- 
Taxon pair Area assumption ` D, taxon pair) 
Serenoa repens (Bartram) Small Americas 
, S 60 M 23 2. p 
Vypa fruticans Wurb. S. Pacific / India A peo 
(Arecaceae) 
Betula nigra L. N. Hemisphere 200 My 35 1.23 x 10-" 
Casuarina litorea L. Australia 
(Betulaceae / Casuarinaceae) 
Nothofagus dombeyi (Mirb.) Oerst. S. America 100 My 30 2.10 x 10-" 
Vothofagus balansae (Baill.) Steenis New Caledonia 
(Nothofagaceae) 
Galphimia gracilis Bartl. S.-N. America: 100 My 3⁄4 2.38 x 107" 
teridocarpus natalitius A. Juss. Africa/ Madagascar / 
(Malpighiaceae) New Caledonia 
Dicella nucifera Chodat S. America 100 My 33 2:81 rx 105 
deridocarpus natalitius A. Juss. Africa/ Madagascar / 
(Malpighiaceae) New Caledonia 
Mascagnia stannea (Griseb.) Nied. S.-N. America 100 My 34 2.38 x 10>" 
Aeridocarpus natalitius A. Juss. Africa / Madagascar / 
(Malpighiaceae) New Caledonia 
Rane 3.01 x 10>" 
Men 2.05 x 10-" 
; +0.75 x 10-" 


. Si x 
a time figure used to represent the breakup of Gondwana (rounded to the nearest 100 My (million years) 
‘ y. as estimated using Terra Mobilis? 2.1 by C. R. Denham and C. R. Scotese; see Wendel & Albert, 


1992: 137). 


` Stan l: " a À Ç x 
dard time figure (ca. early Oligocene) used to represent disruption of the boreotropical interchange between 


North America and Eurasia (see Lavin & Luckow, 1993). 


= š 
at — time figure used to represent separation of the Northern and Southern Hemispheres upon the breakup 
gaea (rounded to the nearest 100 My from 160 My, as estimated using Terra Mobilis* 2.1 by C. R. Denham 


and C. R. Scotese: see Wendel & Albert, 1992: 137). 


4 K 
—— date used by Wilson et al. (1990), based on the fossil record. 
merican Malpighiaceae are here interpreted as representing range expansion from South America, 


—— Quasi-ultrametric data may imply se- 
— Coefficients very close to neutrality. Re- 
— that the underlying premise of selective 
Ka ` is the neutral effect of point mutations, 
eus yc ocklike sequence evolution should involve 
fa ge proportion of such changes, fixed as effec- 
be x neutral substitutions. Such substitutions would 
ih eg to be mainly silent (i.e., synonymous 
mu * to amino acid*), and, with regard to 
labile) O replacements, functionally conservative 
———— in rbcL nucleotide se- 
— is thus an expected manifestation of strong 
ants on protein function.’ 


UNIT CHARACTERS AND FUNCTIONAL CONSTRAINTS 


As recently reviewed by Clegg (1993), a number 
of systematic and evolutionary studies have relied 
solely on rbcL sequence variation. Such analyses 
make the implicit assumption that rbcL nucleotides 
are independent and potentially informative mark- 
ers of cladogenetic events. As discussed above with 
respect to total rates of change, if all branching 
events under consideration are relatively recent, 
parsimony analysis may be expected to proceed 
with a reduced probability of spurious branch at- 
traction because of the absolutely lower expected 


rte T pP NER 
See Clegg (1993) on synonymous rates for rbcL; note that only total substitution rates are relevant to cladistic 


"et e H 
Tee because all informative variation is considered. : : 
d —— that purifying selection eliminates mutations deleterious to protein function and that f is the fraction 
mutations, the neutral theory may be reformulated as 


S=(1 — f) 
where S is the total substitution rate per site and g is the mutation rate (after Nei, 1987: 52, 411). 
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(1) (2) 


Ficures | AND 2. Patterns of historical versus spurious similarity resulting from symmetrically rs nr 
asymmetrical time-samples. In both cases, time-sample refers to the nodes on these imaginary De (ER es 
are essentially time-coincident at 400 My, so the “true tree” appears polytomous. In (2), the — ien 
indicated occur asymmetrically with respect to time, ranging from 400 to 50 My since divergence. bar le re s" 
of nucleotide change are indicated by the filled and open rectangles; the former represent nad — 
cladogenetie history, whereas the latter represent spurious character-state similarity resulting, e.g., fro teat 
nucleotide substitutions. In (1), these patterns of similarity are approximately equal in extent ana d 
clocklike substitutional behavior) but are in partial conflict with each other; parsimony analysis may inclu "€ me 
containing some proportion of ahistorical evidence or even alternatives comprising totally spurious Le : E. 
might be the expectation if taxa A through E were, e.g., Isoetes, Selaginella, Psilotum, Equisetum, 5 š 7 (1) are 
In (2), which approximates the situation in simultaneous studies of sporing and seed plants, the E ech Keier 
only partially alleviated. Patterns of convergent similarity between the oldest taxa, A and Wie r — 
parsimonious reconstructions that pair these taxa spuriously. As divergence time becomes s vu with (D. 
likelihood of multiple changes at sites will insure that D and E are paired historically. Although A xD 
E) by "true" similarity, this relationship may be broken by false similarities between B and C as we — 
(C, D, E). In summary, comparing only anciently diverged lineages with rbcL may suggest patterns o ee 
that represent a hopelessly even mixture of historically reliable and nonreliable similarity. Likewise, comp: 


: : 5 i e consistent 
ancient and recently diverged clades may have the same problem near the base while being relatively mor 
near the tips. This condition may characterize the rbcL-based results shown in this paper. 


sequence divergence and relatively lower associ- 
ated likelihood of character-state parallelism. This 
"time-sampling" strategy has been employed in 
circumscribed studies ranging from particular an- 
giosperm groups (e.g., Conti et al., 1993; Kron & 
Chase, 1993; Rodman et al., 1993) to seed plants 
as a whole (Chase et al., 1993). Here, a “time 
sample" refers to the nodes rather than the ter- 
minals on an imaginary tree; as such, a time sam- 
pling is the collection of absolute and relative tim- 
ings of underlying cladogenetic events in a data 
matrix. Of course, the nodes of a cladogram are 
not discernible a priori to analysis, but their ab- 
solute and relative timing may be estimated by 
external criteria (e.g., the fossil record; cf. Norell 
& Novacek, 1992). 

Initial attempts to analyze time samples beyond 
angiosperms and other seed plants (i.e., including 
rbcL sequences from sporing plants; Albert et al., 
1992b) resulted in cladistic patterns familiar from 
studies based on ribosomal DNA (rDNA) variation 
(e.g., monophyletic gymnosperms or combinations 
of gymnosperm lineages, a seed-plant "root" at 
the Gnetales, an angiosperm "root" at the mon- 
ocots; see Troitsky et al., 1991; Zimmer et al., 
1989; Hamby & Zimmer, 1992). These results, 


however, are in conflict with cladistic studies ie 
on morphological characters (see below). dee: s 
RNAs, with their structural and catalytic — 
protein translation, are obviously under eno E 
functional constraints. Like rbcL, rDNAs may K 
exhibit nearly clocklike substitutional — ds 
those positions that are "free" to kg S: 
absolute rates of change approximate the lo x= 
ues estimated for rbeL, analysis of —— 
time samples might be expected to pss; Kr 
responding patterns of homologous " e 
similarity, and therefore similar hierare sc? SZ 
structions (cf. Donoghue & Sanderson, 1992: 
349). 
To gain insight into the topologic 
vastly asymmetrical time samples (see 
have combined rbcL information pra d 
hytes," “pteridophytes,” “gymnosperms 
— (Table 2). If the substitutional proc? 
is effectively clocklike among these pes “beh 
effects of functional constraints in land p oe 
evolution should be discernible (as — 
branch attractions; see The Rate P both the 
above); we explore this cladistically from SCH 
primary nucleotide data as well as ad hoc nu 


e t 
strings. The rbcL data are examined also 8 


al effects of 
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TABLE 2. 


rbcL sequences used for data transformation and cladistic analysis. These are listed by taxon and by 


GenBank accession number and/or literature reference where sequence data first appeared. Voucher information, 


where available, is given by these sources. 


Taxon 


Conocephalum conicum (L.) Lindb. 
Lophocolea heterophylla (Schrad.) Dumort. 
Anthoceros punctatus L. 
Andreaeobryum macrosporum Steere & 

B. Murray 


Mishler et al., 


GenBank accession or literature reference 


Mishler et al., 1994 
Mishler et al., 1994 
Mishler et al., 


1994 


1994 


Ophioglossum engelmannii Prantl L11058 (J. R. Manhart, in press) 
Psilotum nudum (L.) P. Beauv. L11059 (J. R. Manhart, in press) 
Isoetes melanopoda J. Gay & Durieu L11054 (J. R. Manhart, in press) 
Lycopodium digitatum A. Br. L11055 (J. R. Manhart, in press) 
Angiopteris evecta (G. Forst.) Hoffm. L11052 (J. R. Manhart, in press) 
Equisetum arvense L. L11053 (J. R. Manhart, in press) 
Selaginella Sp. L11280 (J. R. Manhart, in press) 
Botrychium biternatum (Say.) Underwood L13474 (J. R. Manhart, in press) 


Taxus X media 

Taxodium distichum (L.) Rich. 

Podocarpus gracilior Pilg. 

Ginkgo biloba L. 

(Cycas revoluta L. 

Stangeria eriopus (Kunze) Baill. 

Zamia inermis Vovides, J. D. Reese & 

: M. Vásquez-Torres 

Ephedra tweediana C. A. Mey. 
Welwitschia mirabilis Hook. f. 

Gnetum gnemon L. 

Chloranthus japonicus Siebold 

Piper betle L, 

(Drimys) Tasmannia insipida DC. 
Calveanthus chinensis Cheng & S. T. Chang 
Eupomatia bennettii F. Muell. 

Magnolia macrophylla L. 

Persea americana Mill. 

Trochodendron aralioides Siebold & Zucc. 
Ceratophyllum demersum L. 


Nymphaea odorata Aiton 


Lilium superbum L. 

Platanus occidentalis L. 

Caltha palustris L. 

Dillenia indica L. 

Chrysolepis (Castanopsis) sempervirens 
(Kellogg) Hjelmq. 

Betula nigra L. 

Casuarina litorea L. 

Hamamelis mollis Oliv. 


Chase et al., 1993 

Soltis et al., 1992 

X58135 (Bousquet et al., 1992) 
Chase et al., 1993 

B. Schutzman, s.n., FLAS, (M. W. 
Chase et al., 1993 


L12683 (Chase et al., 1993) 
L12677 (Chase et al., 1993) 
Chase et al., 1993 (G. R. Furnier) 
L12680 (Chase et al., 1993) 
L12640 (Chase et al., 1993) 
L12660 (Chase et al., 1993) 
L01957 (Albert et al., 1992c) 
L12635 (Chase et al., 1993) 
L12644 (Chase et al., 1993) 
Golenberg et al., 1990 
Golenberg et al., 1990 
L01958 (Albert et al., 1992c) 


M77030 (Les et al., 1991) plus nucleotides 1184-1428 from 


Qiu et al., 1993 


M77035 (Les et al., 1991) plus nucleotides 1184-1428 from 


Qiu et al., 1993 
L12682 (Albert et al., 1992a) 
L01943 (Albert et al., 1992c) 
L02431 (Albert et al., 1992c) 
L01903 (Albert et al., 1992c) 


Chase et al., 1993 

L01889 (Albert et al., 1992c) 
L01893 (Albert et al., 1992c) 
L01922 (Albert et al., 1992c) 


Chase, unpublished) 


SE acid level for hierarchic compatibility with 
* nucleotide and string evidence. 


NUCLEOTIDES 


Li nucleotide is the smallest unit character 
lable in DNA information. With only four states 
Possible at any given site, nucleotide data are sub- 
ject to parallelism among sequences when the num- 


ber of changes per site, À (= rate-time), becomes 
large. Unlike some morphological characters, nu- 
cleotide data are usually analyzed cladistically with 
no assumed transformation series (i.e., nonadditive 
steps; Fitch, 1971). For such procedures, Albert 
et al. (1993) examined the potential for spurious 
branch attraction under Felsenstein's (1978) sim- 
plified four-taxon scenario. State-change probabil- 
ities with Jukes-Cantor (Jukes & Cantor, 1969) 
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and Kimura 2-parameter (Kimura, 1980) correc- 
tions for multiple changes at sites were considered 
in addition to observed changes only because of 
the prospect of reducing character-state parallel- 
isms. All calculations indicated a very small pa- 
rameter region under which branch attraction could 
be expected, provided that A values remained small 
(i.e., less than approximately 0.1; see Albert et al., 
1992a). For quasi-ultrametric data, differences in 
A values must principally result from divergence 
time differences. 

The bryophyte lineages examined here could 
easily be pre-Silurian; the pteridophytes no later 
than Devonian; the seed-plants appearing by the 
Carboniferous; the angiosperms by the Cretaceous, 
followed by their diversification through the Ter- 
tiary—a time range potentially spanning 500-5 
million years before present. Thus, even without a 
priori knowledge of precise divergence times, it is 
reasonable to approximate upper and lower A-bounds 
from this range and our estimates of total sequence 
divergence. The mean rate for woody taxa (Table 
1), averaged for single lineages by halving the 
divergence value, is approximately 1.0 x 10-7" 
nucleotide substitutions per site per year. Similarly, 
the estimates for herbaceous taxa (Wendel & AL 
bert, 1992) range between 2.5-3.5 x 10-'^, As- 
suming that bryophytes and pteridophytes fall into 
the range 1.0-3.5 x 10-" as well, À values are 
estimated to lie between 0.05-0.175 (500 My) 
and 0.0005-0.00175 (5 My). On a four-taxon 
tree, some combinations of these values would yield 
spurious branch attractions (see Albert et al., 1993). 
Here, we are working with 40 taxa and a greater 


potential for inconsistent results (see Penny et al., 
1991). 


Data analysis. Nucleotide sequences (un- 
ambiguously aligned by sight and excluding the 30 
5'-most positions, which incorporated only primer 
information for some taxa; Table 2) were analyzed 
with PAUP 3.1.1 (Swofford, 1993) using the Fitch 
criterion (Fitch, 1971; cf. Albert et al., 1993) with 
ACCTRAN (accelerated transformation) optimi- 
zation (Farris, 1970; Swofford & Maddison, 1987). 
The heuristic search option was used with 100 
random replicates of data addition sequence, COL- 
LAPSE, MULPARS, and TBR (tree bisection-re- 


connection) branch-swapping. The consistency and 


retention indices (C and R, respectively; Kluge & 
Farris, 1969; Farris, 19892) were also calculated. 
Five hundred fifteen nucleotide positions showed 
patterns of similarity among taxa. 

Eight equally parsimonious cladograms were 
found (C = 0.362 (including all data), R = 0.523). 
The strict and combinable component consensus 
trees (Bremer, 1990) were identical (see Fig. 3). 
All trees indicate that (i) hornworts are nested inside 
the tracheophyte clade, (ii) lycopods rather than 
ferns plus Equisetum represent the sister group to 
seed plants, (iii) Gnetales represent the sister group 
of all other seed plants, (iv) conifers, Ginkgo, and 
cycads form the monophyletic sister group to an- 
giosperms, and (v) monocots are basalmost in the 
angiosperms, followed by Piper. Characteristics (ii) 
and (iv) are shared with the rDNA analysis of 
Hamby & Zimmer (1992) but not with the mor- 
phological analyses of Crane (1985), Doyle & Don- 
oghue (1986, 1992), Loconte & Stevenson (1990), 
and Nixon et al. (1994). Characteristic (i) is in 
conflict with both morphological and molecular cla- 
distic studies (Mishler & Churchill, 1985; Mishler 
et al., 1994, this issue). Characteristic (ii) contrasts 
both with morphological data (Bremer, 1985) and 
with the chloroplast genome structural findings of 
Raubeson & Jansen (1992) that link all tracheo- 
phytes except the lycopods, which have the ple- 
siomorphic (i.e., liverwortlike) state. Characteristic 
(v) contrasts with the results of morphological (Don: 
oghue & Doyle, 1989; Loconte & Stevenson, 1991; 
Taylor & Hickey, 1992) and some rDNA (Hamby 
& Zimmer, 1992; cf. Zimmer et al., 1989) anil 
yses. 


Function and phylogeny. Needless to yo 
not all of the above observations can — 
truth about land-plant history. The groups fo à 
in the nucleotide-based parsimony analysis (Fig. 
may well reflect historical reality, but the — * 
of that reality could be other than strictly p L à 
genetic. From our argument about nearly cloc D 
rates and the functional constraints that mày P 
duce them, it is reasonable to suppose that ur 
or even all of the branchings depicted in Figure 
may reflect primarily spurious similarities ra 
than phylogenetic homologies. We have asses 
possible constraints on rbcL evolution by exa! nining 
the amino acid changes implied on the inte 
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Ficures 3-5. Combinable component consensus trees summarizing the results of parsimony analy for (4) and 


evidence as (3) nucleotide, (4) string, and (5) amino acid data. For (3), the strict consensus is identical: 
(5), the single combinable components are indicated by the percentage of most parsimonious trees that r 


esolve 


would otherwise be polytomies. Implications of the different topologies are discussed in the text. 
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branches of one of the eight equally most-parsi- 
monious trees (Appendix I). As summarized in Ta- 
ble 3, over 84% of the inferred nucleotide substi- 
tutions on internal branches are silent with regard 
to amino acid identity. The percentage of nucle- 
otide changes incurring functionally labile amino 
acid replacements (judged using the PAM-250 log- 
odds matrix of Dayhoff et al., 1978: 352; see Table 
3) amount to an additional = 13%. Viewed as a 
whole, 97.5% percent of all synapomorphous nu- 
cleotide changes are expected to have little or no 
effect on protein function. With a maximum of 
only 2.5% of these changes incurring non-labile 
amino acid replacements of potential structural/ 
functional distinction (see Table 3), rbcL sequences 
appear heavily burdened by forces leading to func- 
tional conservation.” Thus, the challenge for land- 
plant cladistics is to determine how strongly func- 
tionally constrained variation may also reflect phy- 
logenetic patterns. 


STRINGS 


The ideal unit" character in phylogenetic anal- 
ysis is one that truly evolves as an independent 
unit, meaning one that independently undergoes 
transformations from one condition to another that 
are hierarchically correlated (i.e., congruent; cf. 
Farris, 1969) with those of other such characters. 
For molecular data, this may often be the individual 
nucleotide, but possibly also a contiguous length of 
DNA in an insertion/deletion event, several non- 
contiguous nucleotide positions that are function- 
ally associated (e.g., because of higher order RNA 
or protein structure) a unique codon for a func- 
tionally constrained amino acid, or a whole chro- 
mosome in a karyological change. It is of course 
difficult to assess such possibilities a priori, but it 
is nonetheless important to begin to develop meth- 
ods to examine the issue empirically. 

We have thus examined some means by which 
the functional/phylogenetic evidence manifest in 
a given set of rbcL sequences might be represented 
by data forms other than nucleotide positions and 
their character states. The nucleotide is indeed the 
smallest unit character in rbcL evidence, but it is 
not necessarily the most informative nor most con- 
sistent. First, nonadditive optimization of multistate 
characters may restrict potential topological res- 
olution (e.g., a 4-state, nonadditive character can 


— 


'° Patterns of codon usage intrinsic to the primary 
nucleotide matrix are also suggestive of functional con- 
straints; these are discussed in a separate paper (Albert, 
Backlund & Bremer, in press). 


have minimum homoplasy if optimized as three 
autapomorphies). Additionally, direct analysis of 
nucleotide sequences from protein-coding genes ig- 
nores constraints imposed both by the genetic code 
and protein function; codon positions may be both 
intra- and inter-correlated (Fitch & Markowitz, 
1970; Fitch, 1986). 

A data transformation that may overcome these 
shortcomings stems from the early comparison of 
oligonucleotide catalogues (and even whole chro- 
mosomes; see Farris, 1978; Fox et al., 1980; Bre- 
mer & Bremer, 1989) prior to the DNA sequencing 
revolution: production of ad hoc nucleotide strings. 
Our procedure (analogous to generating mapped 
restriction site data) may be outlined thus: (i) gen- 
erate strings of random A, T, G, and C content 
varying randomly in size between 6 and 21 base 
pairs (so that a minimum and maximum of two and 
seven codons are included), (ii) scan rbcL sequence 
data for the presence/absence of given strings, (ii 
record recognitions by both base position and tax- 
on, (iv) treat multiple positional recognitions by à 
given search string separately, (v) treat all rec- 
ognitions found in two or more taxa as binary 
characters for cladistic analysis (sequences that 
have missing information at a string position are 
coded accordingly). Another procedure for pro- 
ducing string data from nucleotide sequences has 
been developed by J. S. Farris (unpublished); se 
quences are subdivided into a prespecified — 
of string characters ("supersites"), each of whic 
is assigned as many states as necessary to explain 
observed variation. Farris's method guarantees 
a complete transformation of the entire sequence 
as well as the non-overlap of string characters, 
unlike the approach used here (see below and Ap- 
pendix II). 

The net effect of transforming sequences 
strings is twofold: (i) it incorporates more infor- 
mation (in terms of nucleotides or codons sp 
in a larger unit character, and (ii) —— 
probability that independent gains of the sa 
character-state are represented in data — 
(although, in parsimony analyses, binary be 
ters are more subject to spurious branch — š 
than are nonadditive multistate characters: al 
et al., 1993). As with mapped restriction site * 
the probabilities of gain versus loss of a gege" 
string are highly asymmetrical, with — cae 
the least likely transformation series arx aa) 
1983; DeBry & Slade, 1985; Albert et al., TT, 
Therefore, string data may contain histor! 
ers much less likely to engage in branch à 
(which occurs because of accumulated pa 
cf. Felsenstein, 1978; Hendy & Penny. 


es into 


ttraction 


1989; 
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TaBLE 3. Analysis of character support for internal branches of tree #1 (of 8) from the nucleotide analysis. 
"Node" refers to the node numbers on the reference tree of Appendix I. “# changes" refers to the total number 
of nucleotide changes optimized onto a branch. “Constant” indicates that the nucleotide site belongs in a codon 
position that codes for the same amino acid throughout the entire matrix. "No change" indicates that the nucleotide 
site belongs in a codon position that codes for two or more amino acids throughout the matrix, but that the particular 
change indicated at this node does not cause a change in amino acid sequence. "'Labile" means that the inferred 
change in amino acid due to the observed change in nucleotide sequence is likely to happen by random chance or 
better (according to the PAM-250 log-odds matrix of Dayhoff et al., 1978: 352). “Potentially nonlabile” indicates 
that at least one of the potential amino acid changes inferred from a particular nucleotide position is not likely to 
happen by random, but that there also are some changes in the same character that are likely to happen by random 
chance or better. “Nonlabile” means that all inferred acid changes (often only one) occur at less than random chance. 














Potentially 

Node # changes Constant No change Labile nonlabile Nonlabile 
78-77 42 22 4 8 5 3 
77-76 24 13 6 4 0 I 
76-71 27 13 9 3 2 0 
71-70 29 19 9 l 0 0 
70-42 40 24 ll 5 0 Q 
42-41 33 26 5 1 0 1 
70-69 42 17 16 8 0 l 
69-66 29 21 8 0 0 0 
66-48 34 15 13 5 0 l 
48-44 25 10 12 2 0 l 
44-43 29 19 8 2 0 0 
48-47 15 7 8 0 0 0 
47-46 24 14 7 3 0 0 
46-45 11 4 4 3 0 0 
66-65 56 34 15 4 0 0 
65-64 26 13 10 3 0 0 
64-63 18 11 6 1 0 0 
63-54 5 2 0 3 0 0 
54-53 4 3 0 1 0 0 
53-51 10 3 1 5 1 0 
51-49 9 4 2 3 0 0 
51-50 8 2 l 5 0 0 
53-52 11 5 2 4 0 0 
63-62 16 1 5 0 0 0 
62-61 14 6 7 1 0 0 
61-59 8 2 4 2 0 0 
59-58 17 8 5 4 0 0 
58-57 13 6 4 3 0 0 
57-56 33 20 6 7 0 0 
56-55 6 3 2 1 0 0 
61-60 8 5 2 l 0 0 
69-68 58 29 18 8 3 y 
68-67 45 24 17 4 d 9 
76-75 34 50 7 4 0 3 
75-74 38 23 12 2 l " 
74-73 45 28 l4 3 0 
73-72 65 43 12 9 1 0 

E 951 529 272 126 13 U 

100.00% 55.63% 28.60% 13.25% 1.37% 1.16% 
84.23% 
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Albert et al., 1992a, 1993) and much more likely 
to contain "blocks" of evolutionarily correlated 
information. Nevertheless, this information could 
be functionally constrained, as with primary nu- 
cleotide data. This possibility can be studied sim- 
ilarly by examining inferred amino acid changes 
on cladograms; each string character is easily traced 
to its recognized codons and component nucleo- 
tides. 

Data analysis. One thousand random strings 
were generated for evaluation (see Appendix II). 
After scanning the 40 rbcL sequences, 193 posi- 
tionally distinct string recognitions were recorded 
(mostly from small strings, the largest being from 
a 15-mer; see Appendix II). Of these, 112 iden- 
tified two or more taxa. As there was no control 
in our procedure for string overlap, a number of 
string recognitions are non-independent with re- 
spect to nucleotides identified (see Appendix II). 
Therefore, our string data carry an experimental 
bias similar to what could occur with restriction 
site data representing mapped cleavage points for 
several endonucleases. The ^supersites" string 
transform (J. S. Farris, unpublished) avoids this 
difficulty entirely, and if modified for the production 
of presence/absence data, would be identical to 
our intent but superior in execution. Nevertheless, 
our string data should suffice to explore biological 
non-independence of nucleotides (functional con- 
straints); in fact, partial replication of nucleotide 
"blocks" could enhance detection of conserved 
regions. Cladistic analysis of the string characters 
was performed under the Wagner criterion (Kluge 
& Farris, 1969; Farris, 1970; see Albert et al., 
19922) using the same program and options men- 
tioned previously; 165 equally parsimonious trees 
were found (C — 0.381 (including all data), R — 
0.524). The combinable component consensus tree 
differs from the strict by only one component (see 
Fig. 4). 

The string data provide a different resolution of 
land-plant relationships than the nucleotide se- 
quences (Figs. 3, 4). Notable differences include 
(i) Equisetum placed among the bryophytes, (ii) 
paraphyly of Psilotum + ferns and paraphyly of 
lycopods, (iii) sister-group status of Gnetales to 
angiosperms (with Piper basalmost), and (iv) par- 
aphyly of angiosperms to conifers + (Ginkgo, cy- 
cads). Characteristics (i) and (iv) are in total conflict 
with other results (listed under Nucleotides, above), 
whereas (ii-iii) are not. 


Function and phylogeny. It could be argued 
that cladograms produced from string-transformed 
data are better phylogenetic representations than 


those derived from nucleotides because the unit 
character is substantially less subject to parallel 
gains (see above). However, this attribute is distinct 
from the nature of the history conserved by string 
data; whole functional units may be incorporated 
into single characters. Gross differences in tree 
topology (including paraphyly of angiosperms) may 
simply result from different representations of func- 
tional and phylogenetic history in string versus 
nucleotide data forms. 

We have studied possible functional constraints 
on rbcL evolution (as above) by examining the 
inferred amino acid changes on the internal branch- 
es of one of the 165 equally most-parsimonious 
string trees (Appendix II). Striking differences from 
the nucleotide-based analysis (Table 3) are shown 
in Table 4: only 45% of string transformations 
(changes in underlying nucleotide sequence) are 
silent with regard to amino acid identity (versus 
ca. 84% in the nucleotide analysis, a decrease by 
half), and functionally labile amino acid replace- 
ments amount to an additional 25% (versus cà. 
13% in the nucleotide analysis, a relative increase). 
Thus, 70% of underlying nucleotide changes ap 
pear to be functionally neutral, whereas non-labile 
amino acid replacements amount to à — 
of 28% (an additional 2.1% are ascribed to interna 
stop codons, which may result from sequencing 
errors). This greater number of presumably func- 
tional changes in underlying nucleotides docs Er 
dicate a greater chance that functional asso PEE 
among particular nucleotides may bias tree con 
struction. 

The different substitutional patterns between nu 
cleotide and string data can be explained by i» 
herent properties of the latter. Each string Ge 
ognition shared by two or more sequences comprises 
much more inclusive and conservative information 
than shared nucleotide identity at a given site. des 
our previous arguments about functional us 
straints in rbcL sequence evolution (see The. e 
“Problem” and Nucleotides, above), the * 
of string recognitions are expected to identify ; 
tionally conserved nucleotide motifs. The propr 
tional reduction in discernible silent substitution 
on the nucleotide level is likely due to the ine 
size of the functional units compared; wé 
base-pair string, the chance of observing ? p 
silent change is at least six times greater * ine 
a single nucleotide position. The proporti be 
crease in labile amino acid replacements can 
explained through similar reasoning; if 8 m 
recognition identifies a functionally conserve 
tif, the larger the motif, the greater the lik e 
that functional preservation need not require 


Volume 81, Number 3 Albert et al. 545 
1994 Functional Constraints and rbcL Evidence 





TABLE 4. Analysis of character support for internal branches of tree #100 (of 165) from the string analysis. 
"Node" refers to the node numbers on the reference tree of Appendix II. “# changes" refers to the total number 
of string changes optimized onto a branch. “Constant” indicates that the string identifies codon positions that code 
for the same amino acid throughout the entire matrix. “‘Labile’’ means that the inferred change in amino acid due 
to the observed change in string recognition is likely to happen by random chance or better (according to the PAM- 
250 log-odds matrix of Dayhoff et al., 1978: 352). “Potentially nonlabile” indicates that at least one of the potential 
amino acid changes inferred from a particular string recognition is not likely to happen by random, but that there 
also are some changes in the same character that are likely to happen by random chance or better, “Nonlabile” 
means that all inferred amino acid changes (often only one) occur at less than random chance. “Internal stop" refers 
to string recognitions that identify internal stop codons, which may be sequencing artifacts. 
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amino acid identity. Strings recognizing regions of Again, these are probably found in greater pro- 
non-labile change, indicating potentially radical portion because of the larger size of the unit char- 
changes in structure and function among taxa, may acters. Rather than being conserved because of 
‘epresent another class of conserved information. functional constraints (as above), such recognitions 
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may identify conserved markers for historical 
groups. Such changes may or may not have drastic 
physiological effects (see Hudson et al., 1990, on 
rbcL ; cf. Perutz & Lehman, 1968; Nei, 1987: 
270-271), but they could be of similar phyloge- 
netic utility as chloroplast DNA rearrangements 
(e.g., Jansen & Palmer, 1987; Palmer et al., 1988; 
Bruneau et al., 1990; Lavin et al., 1990; Downie 
& Palmer, 1992; Downie et al., 1991; Raubeson 
& Jansen, 1992) if well characterized in relation 
to the crystal structure of the large-subunit protein 
(Chapman et al., 1988; Andersson et al., 1989; 
cf. Clegg, 1993). 


AMINO ACIDS 


Because rbcL nucleotide substitutions approxi- 
mate a clock hypothesis (see The Rate “Problem,” 
above), amino acid changes are expected to con- 
form to the neutral hypothesis of molecular evo- 
lution (see Nei, 1987: 47-59, 409-412), although 
we do not directly address this issue here. Direct 
inference of trees can proceed from amino acids 
(yet another transformation of the same primary 
evidence). One limitation of using the amino acid 
sequences themselves is the “factoring-out” of all 
synonymous variation at the nucleotide level; this 
again may make it more likely that functional as- 
sociations among characters may bias tree con- 
struction. Topological resolution may also be lim- 
ited because amino acid data is optimized 
nonadditively (Fitch, 1971) and more than four 
states could be available for given characters (in 
the rbcL sequences examined here, the maximum 
is six states at four different positions). Neverthe- 
less, the greater the number of character states, 
the lower the probability of character-state paral- 
lelism and spurious branch attraction (Albert et al., 
1993). It could thus be argued that amino acid 
data might be more suitable for bridging large 
evolutionary time gaps, given a roughly constant 
rate of substitution combined with ignorance of 
potentially multiple synonymous nucleotide changes. 
Hence, we evaluated the amino acid data for hi- 
erarchic compatibility with the results of the nu- 
cleotide and string analyses. 


Data analysis. After "translating" the 40 
rbcL sequences, 66 (out of the 476) amino acid 
positions identified two or more taxa. Cladistic anal- 
ysis of these characters was performed under the 
Fitch criterion (Fitch, 1971) using the same pro- 
gram and options mentioned previously; 104 equal- 
ly parsimonious trees were found (C — 0.567 (in- 
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cluding all data), R = 0.554). The combinable 
component consensus tree preserved one more 
component than the strict (see Fig. 5). 

The amino acid data provide yet another reso- 
lution of land-plant relationships (cf. Figs. 3, 4): 
(i) lycopods are polyphyletic, with Isoetes sister to 
Angiopteris, (ii) Anthoceros is embedded among 
fern allies, (iii) gymnosperms as a whole (with co- 
nifers polyphyletic) are the monophyletic sister 
group to angiosperms (with Nymphaea basalmost), 
and (iv) Lilium is sister to Dillenia. Except for 
gymnosperm monophyly as hypothesized from 
rDNA data (see Troitsky et al., 1991) these char- 
acteristics are in total conflict with all previous 
studies (listed under Nucleotides, above). 

From the arbiter of congruence, large-subunit 
amino acid data are no more appropriate for bridg- 
ing gaps in asymmetric time samples than nucle- 
otide or string data. As argued above, the clocklike 
behavior of rbcL nucleotide substitution is expected 
to obtain also in the translated amino acid data; 
thus, À values for amino acid changes (and so the 
likelihood of spurious branch attraction) should also 
be sensitive to differences in divergence times. 


Function and phylogeny. Amino acid changes 
in rbcL are apparently subject to strong functional 
constraints (see Nucleotides and Strings, above). 
One could argue that amino acid data is less subject 
to the “noise” of neutrality, i.e., multiple silent 
changes at given nucleotide positions. et. 
selective neutrality may be roughly maintained y 
labile amino acid replacements, which could simi- 
larly “wobble” back and forth across evolutionary 
time. Only a small percentage of individual amino 
acids appears to be involved in function-changié 
evolutionary events (see Nucleotides, above). 


PENULTIMATE CONCLUSIONS 


We have demonstrated the problematic, E 
tionally constrained nature of rbcL — 
rently being used for phylogeny estimation by K 
workers. Three transformations of the same 
dence produced discordant cladistic topologies } 
substantial incongruence with previous morp 
ical cladistic results. Of course, we do not š 
that the growing rbcL database be a ad 
Rather, we suggest (as will be elaborat 
that all investigators involved with rbcL or 
gene data take heed of standard and pone history 
distic procedures for discriminating cladistic ! 
(homology) from homoplasy (functional pà 
and reversal). 
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Tora. EVIDENCE AND CHARACTER CONGRUENCE 
(I) ON CHARACTERS 


Every character in a data matrix showing sim- 
ilarity between two or more taxa is optimized under 
parsimony as a discrete and independent piece of 
information. This holds whether or not the char- 
acter represents a single taxic homology or only a 
portion of one (which is the case with correlated 
or contingent characters). A taxic homology used 
in parsimony analysis is expected to have a single 
functional history (even if this history changes over 
time; see Riedl, 1978; Donoghue, 1989; Donoghue 
& Sanderson, 1992); its cladistic utility De, op- 
timization as synapomorphy or homoplasy) is tested 
at maximum parsimony along with all other char- 
acters in a matrix. From our argument about shared 
functional history (constraints) in the evolution of 
rbcL, one might be tempted to equate a given taxic 
homology (e.g., nuclear versus cellular endosperm 
development) with the entire rbcL gene. However, 
unlike a given taxic homology, rbcL is composed 
of multiple, discrete points of information, that is, 
ts ca. 1428 nucleotides. To a parsimony algorithm, 
each of these data points is equivalent to the single, 
nonadditive taxic homology statement **functional 
pollen unit in the Orchidaceae: monad, tetrad, mas- 
sula, or pollinium," whatever its underlying corn- 
plexity. 

Hence, some workers have found cladistic phi- 
losophy and methodology at an impasse. For ex- 
ample, it has been argued that gene information 
could be combined with other characters either 
through multistate recoding of gene trees (Doyle, 
1 992) or through analysis of component compat- 
—* among separately produced cladograms (Page, 

93). Legitimate concern over potentially sepa- 
rate phylogenetic histories led to these suggestions, 
but we argue below that both approaches unnec- 
essarily restrict the information content of cladistic 
hierarchies, a feature fundamental to the superi- 
To of parsimony methods (see Farris, 1979, 

83); in fact, parsimony itself arbitrates the sup- 
posed analytical quandary. 


(Il) ON EVIDENCE 


F or cladistic analysis, evidence is the body of 
— information that shows patterns of simi- 
rity among terminals. A specific set of evidence 
may be expressed in different forms; we have shown 
S property through different data transforma- 
tions of the rbcL gene (above). Approaches that 
d evidence in the form of tree components 
9 50 at the cost of information content (for recent 


debate on this issue, see Jones et al., 1993; Nelson, 
1993; Barrett et al., 1993; De Queiroz, 1993). In 
fact, acceptance of parsimony as the arbiter of 
synapomorphy and homoplasy seems methodolog- 
ically counterintuitive to component combination, 
which does not directly use such information (see 
Doyle, 1992; Page, 1993). Parsimony, acting over 
all evidence, will provide estimates of congruence 
among character-state patterns while minimizing 
ad hoc assumptions (Farris, 1983). For example, 
some characters from a multigene family (gene 
duplication being part of the functional burden) 
may not show congruence with the body of retained 
synapomorphy because of paralogous histories 
(Fitch, 1970). Nevertheless, analysis of "total" 
evidence (sensu Kluge, 1989) gives each data point 
the opportunity both to affect hierarchy directly 
and to be diagnosed objectively, which is not the 
case when evidence is decomposed a priori and 
later combined or reconciled (cf. Doyle, 1992; 
Page, 1993). In conclusion, although a functionally 
constrained DNA sequence like the rbcL gene may 
appear to deserve the same rank as a given mor- 
phological character, it is more evidence-rich, and 
all of this evidence can be examined for hierarchic 
correlation (sensu Farris, 1969) with other data. 


(11i) AN EXAMPLE 


The extent to which rbcL evidence shows hi- 
erarchic correlation with other evidence should 
provide an objective measure of its freedom from 
biasing functional considerations, and consequen- 
tially, its phylogenetic utility. In this context, we 
examined character interaction between rbcL ev- 
idence and the primarily morphological seed-plant 
matrix of Nixon et al. (1994). Using the set of 
functional histories in the morphological matrix as 
a "constant," we tested the ability of different rbcL 
data forms (i.e., nucleotides, strings, and amino 
acids) to produce a unified representation of the 
same evidence. Two different sets of experiments 
were performed: (i) analyses including fossil taxa 
for which rbcL evidence is lacking (and therefore 
coded as missing data), and (ii) analyses of data 
for extant taxa only (the intersection of available 
evidence). To measure character congruence, we 
have used the retention index: the proportion of 
congruent similarity (i.e., synapomorphy) in a data 
matrix that is retained at maximum parsimony (see 
Farris, 1989a, b, 1991). Although retention is not 
directly comparable among different data matrices 
(see Goloboff, 1991), each matrix within our re- 
spective sets of experiments shares the same "'con- 
stant." Additionally, each data transform of rbcL 
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TABLE 5. Homoplasy and character congruence sta- 7b) or remains sister to conifers (Fig. 8b). Conifers 


tistics for total evidence analyses comprising morpholog- 
ical (Nixon et al., 1994; matrix version as of 8 November 
1993) and rbcL data. Consistency (over all data) and 
retention indices are listed (see text), along with the num- 
ber of trees found (see Figs. 6-8). For comparisons in- 
volving both fossil and extant taxa, 101 morphological 
similarities are relevant (symbolized by *N"); for extants 
only, there are 96 (symbolized by “‘N,,”). The numbers 
of relevant similarities for each rbcL data transform (nu- 
cleotides, strings, amino acids) are given in the text. For 
analyses including fossil taxa, rbcL evidence was repre- 
sented as missing (i.e., “?”). 








Consis- 


tency Retention # Trees 
—— = ee ad 


Fossil plus extant taxa 


N + nucleotides 0.450 0.625 44 

N + strings 0.402 0.685 22 

N + amino acids 0.467 0.710 309 
Extant taxa only 

N.. + nucleotides 0.464 0.601 3 

Na + strings 0.442 0.641 7 

N.. + amino acids 0.518 0.670 24 


—————————————M————— 


is assumed to be evidentially equivalent until shown 
otherwise (this assumption is obviously weaker for 
the string data, as they do not represent a com- 
pletely saturated transformation of the nucleotide 
sequences). Finally, we do not use retention to 
suggest which analysis(es) may be "better." 

The characters and cladistic reconstructions for 
living and fossil seed plants are described elsewhere 
(Nixon et al., 1994). We used the same parsimony 
methods outlined above to examine six combined 
matrices comparing all versus extant-only taxa and 
nucleotide /string /amino-acid rbcL data in all com- 
binations. Consistency and retention indices for 
each analysis are reported in Table 5, and topo- 
logical results are summarized in F igures 6-8. 
Character congruence, as measured through re- 
tention, is similar in magnitude (range < 0.1) across 
each set of experiments. Although topological res- 
olution and component placements differ somewhat 
with respect to the rbcL data form used (Figs. 6- 
8; see Nixon et al., 1994), the rbcL evidence 
appears to be making a consistent statement along 
with the morphological evidence. 

With respect to extant taxa, monophyletic cy- 
cads are the most topologically ancestral in all 
analyses including fossils (Figs. 6a-8a). Ginkgo 
appears either external to Cordaites plus conifers 
(Figs. 6a, 7a) or monophyletic with these taxa (Fig. 
8a). In extant-only analyses, Ginkgo similarly in- 
tercalates between cycads and conifers (Figs. 6b, 


themselves are monophyletic in most combined 
analyses (Figs. 6a, b, 7a, 8a, b), but are partially 
unresolved in the extant-only analysis with string 
data (Fig. 7b). Every analysis resolves the Gnetales 
and Bennettitales as sister to the angiosperms. 
Ephedra is uniformly sister to Gnetum plus Wel- 
witschia, but resolution within Bennettitales is pro- 
vided only in the combined analysis with amino 
acid data (Fig. 8a). Ceratophyllum is placed sister 
to all other angiosperms (see Les, 1988; Chase et 
al., 1993; Qiu et al., 1993) in the combined nu- 
cleotide and string analyses (Figs. 6a, b, 7a, b), 
but not in the combined amino acid analyses (Fig. 
8a, b), where it either nests well within angiosperms 
(sister to Chloranthus; Fig. 8a) or remains unre- 
solved (Fig. 8b). Indeed, relationships within the 
angiosperms are the least stable among the com- 
bined data analyses. Woody magnoliids occupy the 
basalmost branches in Figure 6a, whereas the pe 
leoherb" taxon Nymphaea occupies this position 
in Figure 7a, and all other analyses are indecisive 
on this point. Eudicots (angiosperms with triaper- 
turate or triaperturate-derived pollen; here, Plat- 
anus, Caltha, Trochodendron, Dillenia, Hama- 
melis, Chrysolepis, Betula, Casuarina) ae 
monophyletic in the combined nucleotide and string 
analyses (Figs. 6a, b, 7a, b) (see Chase et al., 1993) 
but are polyphyletic in the combined amino acid 
analyses (Fig. 8a, b). For further discussion and 
reference to cladograms based solely on the mor- 
phological evidence, see Nixon et al. (1994). 

The topological differences resulting from Ke 
of either rbcL nucleotide, string, or amino ac 
data might imply that different sets of morpholog- 
ical characters (of Nixon et al., 1994) show con- 
gruence with these different data forms. lf one 
were to hold the evidential significance of the mor- 
phological data constant, one might identify * 
portions of primary rbcL nucleotide sequence * 
were incongruent under each data form and igno 
them in future studies. Alternatively, one x c 
take the opposite approach and ignore those NX 
et al. (1994) characters that were not — 
among all rbcL data forms. We suggest that ae 
approach is nihilistic with respect to either r , 
or morphology; because congruence is an gc 
of total interaction, the utility of either set p 
idence is always judged relative to the other. edat 
ertheless, hierarchic correlation can be direc! 
one subset of total evidence if, as in phos 
rbcL, it is reasonable to assume a single, : 
functional history. If an investigator were c 
to hold all evidence except rbcL constant, hype 
eses of correlation between functional cons 
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6(a) 


Ó(b) 


Cycadaceae 
Stangeriaceae 
Zamiaceae 
Ginkgo 


Podocarpaceae 
Taxaceae 


Taxodiac./Cupressac. 


Ephedra 
Welwitschia 
Gnetum 
Ceratophyllum 
Chloranthus 
Winteraceae 
Nymphaea 
Piper 

Lilium 
Calycanthus 
Persea 
Eupomatia 
Magnolia 
Platanus 
Caltha 
Trochodendron 
Dillenía 
Hamamelis 
Chrysolepis 
Betula 
Casuarina 


Ficures 6-8, Total evidence analyses of morphological and rbcL data for fossil and extant seed 


Aneurophyton 
Archaeopteris 
Lyginopteris 
Medullosaceae 
Cycadaceae 
Stangeriaceae 
Zamiaceae 
Callistophyton 
Pentoxylon 
Gink 
Cordaites 
Podocarpaceae 
Taxaceae 
Taxodiac./Cupressac. 
Corystosperm 
Lepidopteris 
Tatarina 
Glossopterids 
Caytonia 
Williamsoniella 
Cycadeoidea 
Williamsonia 
Ephedra 
Welwitschia 
Gnetum 
Ceratophyllum 
Winteraceae 
Eupomatia 
Magnolia 
Calycanthus 
Persea 
Chloranthus 
Piper 

Lilium 
Nymphaea 
Caltha 
Platanus 
Trochodendron 
Dillenia 
Hamamelis 
Chrysolepis 
Betula 
Casuarina 


plants. The 


morphological data and taxon sampling of Nixon et al. (1994; matrix version as of 8 November 1993) was followed 


or cladistic analy 


ses of fossil and living seed plants (the “a” series) and of extant seed plants only (the "b" series). 


For both taxonomie scopes, rbcL evidence was combined as one of three data forms: nucleotide sequences (6), 


nucleotide string 


recognitions (7), or amino acid sequences (8) obtained from single organisms (see Table 2). For 


— 
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Zb 


analyses including fossil taxa, rbcL character st 
Swofford, 1993: 2] 


or the 


Tab 


strict consensus 
le 5). See text for further discussion. 


Cycadaceae 


Stangeriaceae 
Zamiaceae 
Ginkgo 
Podocarpaceae 
Taxaceae 


Taxodiac./Cupressac. 


Ephedra 
Welwitschia 
Gnetum 
Ceratophyllum 
Winteraceae 
Calycanthus 
Persea 
Nymphaea 
Lilium 
Chloranthus 
Piper 
Eupomatia 
Magnolia 
Dillenia 
Trochodendron 
Caltha 
Platanus 
Hamamelis 
Chrysolepis 
Betula 
Casuarina 


combinable component consensus 
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; l 
ates were scored as missing (i.e., ?"; cf. Platnick et al, 
24), Topological results (from PAUP 3.1.1: Swofford, 1993) shown represent either 
in all cases) of all most-parsimonious trees 


Aneurophyton 
Archaeopteris 
Lyginopteris 
Medullosaceae 
Cycadaceae 
Stangeriaceae 
Zamiaceae 
Callistophyton 
Pentoxylon 
Ginkgo 
Cordaites 
Podocarpaceae 
Taxaceae 
Taxodiac./Cupressac. 
Corystosperm 
Lepidopteris 
Tatarina 
Glossopterids 
Caytonia 
Williamsoniella 
Cycadeoidea 
Williamsonia 
Ephedra 
Welwitschia 
Gnetum 
—— 
aea 
——— 
Eupomatia 
Magnolia 
Winteraceae 
Chloranthus 
Piper 
Persea 
Lilium 
Dillenia 
Trochodendron 
Caltha 
Platanus ` 
Hamamelis 
Chrysolepis 
Betula 
Casuarina 


99l: 


found (se 
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8(b) 





8(a) 


Cycadaceae 


Stangeriaceae 
Zamiaceae 
Ginkgo 
Podocarpaceae 
Taxaceae 


Taxodiac./Cupressac. 


Ephedra 
Welwitschia 
Gnetum 
Chloranthus 
Piper 
Calycanthus 
Persea 
Ceratophyllum 
Lilium 
Platanus 
Caltha 
Dillenia 
Chrysolepis 
Hamamelis 
Betula 
Casuarina 
Winteraceae 
Trochodendron 
Nymphaea 
Eupomatia 
Magnolia 


Aneurophyton 
Archaeopteris 
Lyginopteris 
Medullosaceae 
Cycadaceae 
Stangeriaceae 
Zamiaceae 
Callistophyton 


Glossopterids 
Caytonia 
Pentoxylon 
Ginkgo 
Cordaites 
Podocarpaceae 
Taxaceae 
Taxodiac./Cupressac. 
Williamsoniella 
Cycadeoidea 
Williamsonia 
Ephedra 
Welwitschia 
Gnetum 
Winteraceae 
Calycanthus 
Persea 
Trochodendron 
Nymphaea 
Eupomatia 
Magnolia 
Platanus 
Dillenia 
Chrysolepis 
Hamamelis 
Betula 
Casuarina 
Piper 

Lilium 

Caltha 
Chloranthus 
Ceratophyllum 
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and phylogenetic history could be generated from 
the congruence patterns of each rbcL character. 


CONCLUSIONS 


The phylogenetic informativeness of rbcL vari- 
ation is obviously subject to any special properties 
the gene may have. Unlike for most morphological 
characters, some such properties can be listed for 
rbcL with confidence: (i) rbcL nucleotides show 
clocklike substitutional behavior, which may either 
help or hinder tree reconstruction depending upon 
the temporal depth and asymmetry of a given phy- 
logenetic question; (ii) strong functional constraints 
exist over the majority of informative nucleotide 
characters, which is expected from (i) under the 
neutral theory; and (iii) the form that rbcL evidence 
takes (e.g., nucleotides, strings, or amino acids) 
does not appreciably affect its interaction with other 
evidence containing diverse functional histories (e.g., 
morphological data). 

Although rbcL trees often appear consistent with 
taxonomic opinion (or are substantially congruent 
with other cladistic topologies), their power as lone 
cladistic tools will always be restricted by the in- 
trinsic limits of internal evaluation of data. Because 
rbcL sequences clearly have a unifying functional 
history, simultaneous study of all available evi- 
dence become imperative. Functional constraints 
on rbcL, rDNA, or endosperm evolution are not 
expected to be similar; therefore patterns of char- 
acter congruence among such diverse information 
sources will provide hypotheses of cladogenetic his- 


tory significantly more powerful than studies of 
rbcL alone. 
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APPENDIX 1 (pp. 554-562).* Inferred —— 
changes on the internal branches of a nuc eg? 
cladogram (one of eight equally meet paria s Ë 

This table and accompanying cladogram we nucle: 
formation about the functional impact of speci Mes 
otide changes (as reflected by alterations m ge pAUP 
identity). Following the apomorphy list forma í ° 
3.1.1 (Swofford, 1993), each internal branch © 
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erence tree is identified by the nodes it connects. For 
each node pair, optimized nucleotide changes are identified 
by position (“POS,” i.e., the 1-1428 bases of the rbcL 
gene used), character consistency index ("c," each of 
which represents a separate contribution of the ensemble 
consistency of the entire tree; see Farris 1989a), the 
actual change inferred (““NUCA,” with arrows following 
the conventions in the PAUP 3.1.1 manual; Swofford, 
1993; 121), amino acid changes ("AA") that occur at 
this position (listed nondirectionally; see below), and their 
substitutional category ("SC") as determined from the 
PAM-250 log-odds matrix of Dayhoff et al. (1978: 352; 
log-odds scores of 0 and above are considered labile (L), 
whereas negative values are here considered nonlabile 
(NL); potentially nonlabile (PNL) indicates mixed-odds 
changes at the codon involving a given position, and 
synonymous changes (constant amino acid identity) are 
indicated by “—”). 
For example, a line of the following form 


IS  100c—g RLA NL 


can be readily diagnosed: character 175 changes from 
nucleotide C to nucleotide G (on this particular tree; 


constancy of character-state reconstruction among all 8 
trees would be indicated by a double-lined arrow) with a 
c of 1.000 (i.e., no homoplasy), and the codon in which 
character 175 belongs changes between the amino acids 
R, L, and A (using standard IUB amino acid codes; see 
Nei, 1987: 24; Swofford, 1993: 67). Note, however, that 
this does not necessarily mean that this particular char- 
acter-state change gives the indicated changes in amino 
acid sequence; rather, it merely indicates that it might 
be involved in the changes (i.e., the C — G nucleotide 
transformation may not affect amino acid identify at all; 
thus, the indicated amino acid changes are the “worst” 
that can happen under the influence of character 175). 
The NL designation indicates that any pairwise transfor- 
mation between R, L, and A would represent a nonlabile 
change. 
In the line below 


486 0.167 a >g I — 


there is a nucleotide transformation in position 486, yet 
it can be positively diagnosed as not responsible for the 
different amino acid identities in its associated codon (thus, 
the SC is given as “—”’). 


* Correction added in proof: P. 560, under “Nope 62-61," third line from bottom, right hand column, should 


read “L,” 
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Conocephalum 
Lophocolea 
pl Andreaeobryum 
| ——— Equisetum 
r— — Angiopteris 
e — r Psilotum 
li £ r Ophioglossum 


nA — Botrychium 
Anthoceros 
r— —— Lycopodium 

1 T Isoetes 


— Selaginella 
r—— Ephedra 
Les Welwitschia 

— Gnetum 
—— Podocarpus 


ben Taxus 
cS 


m A — Taxodium 
fá: — Ginkgo 
(‘a r— —- Cycas 
U p: — Stangeria 
— Zamia 
Lilium 
| Piper 
65 


[ne Chloranthus 
S e Calycanthus 


64 C I — Persea 
53 z Drimys 
i T — Ceratophyllum 


63 L — g Eupomatia 
| —- Magnolia 


bu: — Platanus 
b. [ — —alba 
N —T H Trochodendron 
Lee Hamamelis 
L-5 —— Dillenia 
gar ê r——— Chrysolepis 
T i Betula 
—— Casuarina 
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NODE 78-77 11760.250 a->g E,D = NODE 70-42 
POS c NUCA AA sc 1254 0.429 t-»a constant - POS c NUCA AA sc 
68 0.200 a->c T,N L  13630.167 c->t constant 7 48 0.500 t=>c t 
69 0,500 c->t T,N = 102 0.429 t=> tees; i 
102 0.429 a->t constant - NODE 76-71 109 0.333 Se ee z 
150 0.231 a->t A,P,S L Pos c NUCA AA SC 280 0.250 g->a D,E,K,T r: 
165 0.231 a->t A,W NL 60 0.250 t->c constant - 313 0.250 t->c Sieger - 
175 1.000 c->g R,L,A NL 111 0.200 a=>g constant > 336 0.250 t->c constant 
186 0.222 a->c constant - 138 0.273 t->a P,L - 403 0.333 c=>t constant a 
204 0.375 a->t constant - 165 0,231 t-»c A,W = 447 0.167 a->g Q,M,I,T,L,W L 
e 1.000 a-»c constant - 225 0.333 t-»c constant - 453 0.273 a->g cde ed d - 
Eeer a. x AE E Ti 
405 0.222 t-»a constant - 327 0.167 a-»g constant - 630 0.300 ES aae - 
433 0.250 a->t T,V,S,I PNL 351 0.16? t=>c constant - 648 0.200 t-»c S - 
435 0.300 a->c T,V,S,I PNL 486 0.167 a->g L,S - 690 0.429 t=>c A,G,T - 
be 0.200 t=>c constant - 564 0.214 t->a A,V - 699 0.250 t-»c REE - 
E: "ee wer — ñ m — a->g constant - 711 0.250 a~>t constant - 
š t->c constant > 729 0.250 t->c constant - 
oa 0.667 c-»g S,C,Y L 682 0.333 t-»g S,A L 840 0.167 g-»a L,S - 
s (es e> —— PNL 708 0.200 a=>g constant - 843 1.000 t=>c A,S - 
Me Rin m " NL 759 0.333 a-»g constant - 870 0.600 t-»a constant - 
a constant = 785 0.200 c->t V,M,A PNL 885 0.286 t-»c constant - 
Gi 0.200 t->c V,M,A PNL 786 0.500 t->g V,M,A L 915 0.167 a-»g KR * 
G Hirt at V,M,A L 844 0.200 t-»c H,Y,S,F PNL 975 0.333 t=>c constant - 
M a-»t constant - 858 0.167 t-»c constant - 982 0.182 g->t A,S,T L 
iin Lec. IZ 77 
T — = Ç - š g->c Q,E,D L 
2 Here — H,Y,S,F PNL 11980.167 t-»c L,S =  10420.167 t-»c L,S - 
Len “hem acre D,R - 12120.429 a->g constant -  10680.333 a->g K,R,E - 
Mem s a L,M Ü 1320 0.143 a=>g Q,E,A - 10770.200 t->c constant = 
ise oe ES Constant - 13350.167 t->c constant = 11760.250 g->a E,D = 
ido E constant - 13980.250 a->g R,K,I - 11980.167 c->t L,S = 
E a->t constant - 12060.111 t->c constant - 
Mes lE Soe — - NODE 71-70 12210.200 a-»g L,S - 
11010.250 t-»c —— Be — — — — d 
bere E i 88 0.143 g->a E,K,Q,T Ln 12600.250 t-»c constant - 
12120.429 t-»a constant - — ia "poo Sas usss : 
a oe oe 150 0.231 t=>c A,P,S 5 1335 0.167 c->t constant - 
12600.250 c->t — a Pn Se PENTECOTE 2 SE eg s 
ege d eegen "E = Ee EE  — 
IS ge A oe n 315 0,167 a->g constant = 13110,200,t-»c. constant E 
13460.125 = EE 342 1.000 c=>t constant = 
,S,T, L 387 0.333 t=>c constant - NODE 42-41 
NODE 77-76 405 0.222 a->g constant - POS c NUCA AA sc 
pos EE 414 0.167 a=>g S,L > 132 0.286 t->c constant - 
120 0.167 Aere c SC 444 0.167 t-»c F,C = 189 0.375 t=>c constant - 
SY gaa coe nstant = 510 0.167 a->g constant - 207 0.600 t=>c constant = 
RTE acne —— 519 0.182 c->t constant = 225 0.333 c=>t constant - 
56? 1.000 ns ,P,L,T,I NL 711 0.250 g->a constant - 267 0.375 t=>c P,T = 
REL constant - 720 0.200 a-»g constant - 297 0.200 t=>c A,V,C = 
SE OSs cre constant - 792 0.500 t->c I,S œ 324 0.167 t=>c constant - 
RE Ges SC constant - 795 0.250 t-»c V,G = 441 0.286 t=>c constant - 
DE: Caen x Constant = 822 0.143 t->c constant - 459 0.250 t=>c constant - 
ET I SES - 876 0.143 t-»c constant - 528 0.429 a->g constant z 
930 0.111 vex ‘ L 981 0.143 t->c constant = 567 1.000 t->c constant - 
960 1.000 ac SUMMAE - 10710.167 t=>c constant - 676 0.500 t-»a Y,N,F NL 
963 0.182 RE e - 11010.250 c->t constant - 696 0.286 a=>g constant = 
Si ies oe x4 — 411280.222 t-»c constant =~ 702 0.200 a-»g constant ~ 
984 0.182 d Pe —— - 11490.111 t-»c constant - 718 0.333 t-»c constant - 
10180.250 c-> -S,T - 11680.250 t=>c L,* - 744 0.250 a=>g constant - 
"reda: eras Q,E,D L  11700.143 a-»g L,* - 168 0.167 t=>c C,F = 
TFS: constant - 11790.400 t=>c constant - 780 0.143 a=>g constant G 
TTT * L 12450.200 t->a constant - 855 0.400 a=>g constant = 
1116 0.229 e Les - 13630.167 t->c constant - 897 0.667 a=>t A,V = 
11250.375 a-> * 3 — toi ae ; 
11370.231 oso LS,F,.LM L 969 0.429 a-»g constant - 
z 993 0.750 a=>g constant = 


a->g 


constant 





558 Annals of the 
Missouri Botanical Garden 
996 0.400 a=>g constant - 279 0.167 a=>g constant - 456 0,222 t->a constant - 
10110.429 a=>g constant = 315 0.167 g->a constant e 471 0,500 t=>g AN = 
10210.333 a->g V,I,L,M L 412 0.200 t=>c S,L - 505 0.200 c->t constant - 
10950.500 t=>c constant - 505 0.200 t-»c constant = 538 0.400 t=>c L,I - 
11370.231 g->a constant - 534 0.200 a-»g constant - 718 0.333 t->c constant * 
11400.300 a=>g constant - 549 0.200 a->g constant - 759 0.333 g=>a constant - 
11640.333 t=>c P,Q,S * 600 0.333 t=>c constant - 768 0.167 t=>c C,F - 
1212 0.429 g-»c constant - 624 0.667 t=>c constant - 825 0.375 t=>g T,I - 
12960.333 t=>c constant - 663 0.500 c->t V,C - 835 0.500 a=>t S,I,T L 
13771.000 t-»c constant - 687 0.167 a-»g constant - 836 0.222 g-»c S,I,T L 
696 0.286 a-»g constant = 837 0.300 c=>g S,I,T > 
NODE 70-69 780 0.143 a-»g constant m 10230.231 c-»a V,I,L,M a 
pos c NUCA AA sc 813 0.231 a=>g constant - 11220.400 t=>c constant = 
10 0.500 c=>a - 861 0.143 t->c constant - 11280.222 c->t constant ñ 
excluded, (PRIMER) 963 0.182 c->t C,S - 11980.167 c->t L,S ^ 
15 1.000 g=>a - 10290.250 a-»g constant = 12240.429 a->g constant > 
excluded, (PRIMER) 1047 0.429 t=>g constant = 12630.500 a=>g R,* = 
75 0.125 ct Y,F = 10560.167 c->t constant - 13890.143 g->a constant L 
84 0.214 t-»g D,E,Q L 11400.300 a=>g constant - 13971.000 a->t R,K,I NL 
88 0.143 a->c E,K,Q,T L 11730.167 t=>c constant - 14131.000 a->t T,A,S,E,P ~- 
108 0.400 t-»c I,T = 11850.200 a-»g constant - 
124 1.000 a-»g M,V,L L 12030.200 a=>g constant - NODE 44-43 
126 1.000 g-»a M,V,L -  13980.250 g->a R,K,I - pos c NUCA AA sc 
165 0.231 c-»a A,W - 10 0.500 a-»c - excluded, 
201 0.250 t=>c constant - NODE 66-48 18 0.333 g-»a - excluded, 
246 0,333 t-»a constant = Pos c NUCA AA SC 81 0.333 t->a constant - 
271 0.250 g-»c P,A,V,T L 33 0.500 t-»c V,S,F,D,A  - 258 0.333 c=>t G,D,E,N,H  - 
318 0,250 t=>c constant - 84 0.214 g=>a D,E,Q - 284 0.286 a-»g N,D,S,T,E,G L 
321 0.333 g=>t constant - 138 0.273 c->t P,L - 318 0.250 c=>t constant < 
327 0.167 g-»a constant - 243 0.200 a=>g constant - 414 0.167 g=>a S,L = 
388 0.333 t=>c constant - 290 0.125 a-»t Y,F L 435 0.300 c-»a T,V,S,I c 
397 0.333 t-»c L,S,I - 297 0.200 t-»c A,V,C - 450 0.214 t-»c constant Z 
486 0.167 g->a L,S - 309 0.143 t-»c constant - 498 0.333 c-»t constant -~ 
504 0.167 t-»c constant - 312 0.182 t-»c P,F - 504 0.167 c=>t constant e 
522 0.286 t-»c constant = 346 0.333 a->c M,L L 507 0.333 a=>g constant < 
660 0,167 c->t constant - 498 0.333 t-»c constant - 522 0.286 c-»a constant 3 
661 1.000 g=>t V,C - 546 0.250 t-»c constant - 564 0.214 a->c A,V Y 
662 1.000 t=>g V,C - 552 0.200 c-»t constant - 579 0.375 t-»c constant =~ 
663 0.500 a-»c V,C - 570 0.200 t-»c constant = 612 0.111 g->a constant z 
672 0.300 t->a constant - 612 0.111 a-»g constant = 618 0.333 a=>g constant D 
673 0.111 c=>a L,I - 639 0.333 t=>c constant — 702 0.200 a-»g constant -~ 
764 0.400 a=>t A,Q,E,V,H,I NL 656 0.500 t-»g L,V,C - 813 0.231 g->a constant -~ 
786 0,500 g-»t V,M,A - 657 1.000 a-»c L,V,C NL 952 0.500 t-»c L,S Y 
810 0,333 g-»a constant - 693 0.167 a-»g constant - 984 0,182 c->t A,S,T É 
837 0.300 t->c S,I,T - 771 0.375 t->c constant * 10450.333 c=>t constant - 
852 0.286 t-»c constant v 808 0.167 t-»c constant = 1107 0.333 a-»c constant Ç 
864 0.333 t=>c constant = 810 0.333 a->g constant - 11160.222 a-»g P,A E 
906 0.286 c->t D,R - 822 0.143 c->t constant -  11370.231 g-»a constant ` 
927 0.231 t-»g I,M L 885 0.286 t-»c constant -  11400.300 g=>a constant — ^ 
940 0.250 t=>c L,S - 914 0.143 a-»g K,R L  12151.000 a-»c constant — 
10170.333 t-»a constant - 954 0.286 a-»g L,S - 12660.429 t-»c constant C 
1023 0.231 a-»c V,I,L,M - 10210.333 a->g V,I,L,M L 13380.333 t=>c constant ` 
10580.500 a=>t Y,F,C,L L  10420.16) t-»c L,S -  13460.125 g-»c A,S,T,C ` 
11160.222 t-»a P,A -  12210.200 a-»g L,S -  13590.286 c->t P,A,L E 
11230.250 t=>c L,S,F,I,M L  12450.200 a-»t constant - 
12120.429 g-»a constant - 13200.143 g=>a Q,E,A - NODE 48-47 c 
Ven ee et A a ~ sos e EK E 
13920.143 a->g constant STEEN ec " EC ee p 
` - 160.667 g-»t I,M,V,W L 150 0.231 c-»t A,P,S 3 
— 14220.429 g=>t T,V,L,K 7 159 0.167 a-»g constant 
E 69-66 " 
165 0.231 a-»t A.W 3 
POS c NUCA AA sc NODE 48-44 549 0.200 g-»a constant = 
EE M 
Tis 0 385 — e i SOR - 90 0.250 a-»g E,K,Q,T - 741 0.111 t-»c — 
ARS EN ge = 147 0.154 a=>c constant = 861 0.143 c-»t cons < 
267 0.375 ges P,T or RI UE Hae A cero tant ~ 
276 0.286 a-»g — en A hee ee ee GE t * 
: 7 393 0.231 a-»g R,P - 12690.600 t-»c constan 
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14100.429 a->g 
14201.000 a->g 
14210.667 c->t 
14250.429 a-»g 


NODE 47-46 


POS c NUCA 
75 0,125 t=>c 
102 0.429 t=>c 
117 0.500 a=>g 
177 0.300 c=>t 
231 0.286 t-»c 
246 0.333 a=>g 
321 0.333 t-»c 
346 0.333 c-»a 
402 0.500 t-»c 
405 0.222 g->a 
412 0.200 c=>t 
519 0.182 t->c 
522 0.286 c->t 
552 0.200 t-»c 
660 0.167 t=>c 
753 0.188 g->a 
807 0.250 t-»c 
834 0.600 t-»c 
957 0.400 t-»a 
963 0.182 t-»c 
1067 1.000 a=>g 
11940.250 t=>c 
12060.111 t=>c 
12570.500 t=>g 


NODE 46-45 


POS c NUCA 
88 0.143 
141 0.333 
162 0.429 
279 0.167 
284 0.286 
741 0,111 
762 0.333 
957 0.400 
1209 0.286 
12660.429 
1362 0.429 


c-»a 
a-»g 
a-»g 
g-»a 
a-»c 
c-»t 
a-»t 
a-»c 
t-»c 
t-»c 
a-»g 


NODE 66-65 


POS c NUCA 
62 0.500 g=>a 
$6 0.167 a-»g 
88 0.143 c-»g 
144 0.333 g-»t 
153 0.111 g-»a 
162 0.429 a=>g 
168 0.273 a-»g 
201 0.250 c=>a 
207 0.600 t=>g 
255 0.200 t-»c 
?56 0.667 g=>c 
271 0.250 c-»g 
363 0.250 a=>g 
378 0.500 a=>c 
408 0.167 a-»g 
950 0.214 t-»c 
453 0.273 a-5g 
462 0.429 t-5c 
186 0.167 a-5g 
492 0.250 a=>g 


E,D,A,K,P,Q 
T,V,L,K 
T,V,L,K 
L,V,C 


AA 

Y,F 
constant 
constant 
R,L,A 
constant 
constant 
constant 
M,L 
constant 
constant 
S,L 
constant 
constant 
constant 
constant 
L,M, I 
constant 
T,M 

R,C 

CS 
K,R,E 
S,F,A 
constant 
constant 


AA 

E,K,Q,T 
constant 
A,W 
constant 
N,D,S,T,E,G 
S,C,Y 
A,Q,E,V,H,I 
R,C 
constant 
constant 
E,D 


AA 
R,K,T 
L,I 
E,K,Q,T 
constant 
constant 
A,W 
constant 
constant 
constant 
ec 
G,D,E,N,H 
P,A,V,T 
constant 
constant 
constant 
constant 
constant 
constant 
L.S 
constant 


522 
537 
579 
582 
618 
621 
648 
666 
684 
690 
705 
708 
762 
795 
807 
816 
819 
882 
912 
933 
984 
990 


NODE 
pos 
165 
186 
228 
351 
456 
537 
555 
672 
673 
741 
753 
879 
915 
982 
990 


0. 
0. 
D. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
10050. 
10170. 
10200, 
10600. 
11070. 
11310. 
12060. 
12660. 
12780. 
13300. 
13410. 
14010. 
14070, 
14110, 


0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
o°. 
0. 
10110. 
10170. 
10470. 
10800. 
11370. 
11670. 
11940. 
13560. 
14110. 
14220. 
14250. 


286 
429 
375 
167 
333 
250 
200 
500 
300 
429 
333 
200 
333 
250 
250 
750 
250 
100 
333 
143 
182 
500 
375 
333 
200 
333 
333 
333 
111 
429 
500 
167 
200 
250 
500 
600 


c->t 
t->a 
t->c 
t=>c 
a=>g 
t=>c 
t=>c 
a=>c 
t=>g 
t=>c 
t-»c 
g-»a 
a=>c 
c-»a 
t=>c 
a-»g 
t=>a 
c->t 
a=>g 
c->t 
c->t 
t->a 
t-»g 
a-»c 
a-»g 
a-»g 
a=>c 
a->g 
t->c 
t=>g 
t=>g 
g->a 
t->c 
t=>c 
t->c 
a->c 


65-64 

c NUCA 
231 a=>t 
222 c=>t 
125 t=>c 
167 c=>t 
222 t->c 
429 a->g 
100 t->c 
300 a->t 
111 a=>c 
111 t=>c 
188 g->a 
667 t=>c 
167 a=>g 
182 g=>t 
500 a->c 
429 a=>g 
333 c->g 
429 g=>a 
333 t=>c 
231 g=>a 
200 t=>c 
250 t=>c 
143 t->c 
600 c->g 
429 g=>c 
429 a->g 


constant 
constant 
constant 
constant 
constant 
constant 
constant 
constant 
A,S 
A,G,T 
I,V 
constant 
A,Q, E, V, H, I 
V,G 
constant. 
constant 
constant 
constant 
constant 
constant 
A,S,T 
T,I 
constant 
constant 
Q,E,D 
Y,F,C,L 
constant 
constant 
constant 
constant 
A,V 

LN 
A,S,T,C 
constant 
F,I,L 
T,A,S,E,P 


AA 

A, W 
constant 
N.S 
constant 
constant 
constant 
constant 
constant 
L,I 
S,C,Y 
L,M,I 
constant 
K,R 
A,S,T 
T,I 
constant 
constant 
constant 
constant 
constant 
A,L 
S,F,A 
constant 
T,A,S,E,P 
T,V,L,K 
L,V,C 


NODE 64-63 

Pos c NUCA 
150 0.231 c=>t 
153 0.111 a->g 
309 0.143 t=>c 
378 0.500 c->g 
474 0.250 a=>g 
564 0.214 a-»g 
612 0.111 a-»g 
696 0.286 g=>a 
753 0.188 a-»c 
771 0.375 t=>c 
813 0.231 g->a 
885 0.286 t=>c 
927 0.231 g=>a 
951 0.222 a->g 
10600.333 g->a 
12990.125 a=>g 
13200.143 g=>a 
13800.200 a-»g 


NODE 63-54 

POS c NUCA 
84 0.214 g=>c 
433 0.250 t->a 
546 0.250 t=>c 
672 0.300 t->c 
1020 0.200 g->c 


NODE 54-53 

POS c NUCA 
543 0.333 t-»c 
813 0.231 a->g 
982 0.182 t->g 
12450.200 a->t 


NODE 53-51 

POS c NUCA 
45 0.750 t-»c 
424 0.200 c=>a 
425 0.200 c->t 
433 0.250 a=>g 
434 0.250 c=>t 
672 0.300 c->t 
753 0,188 c-»g 
864 0,333 c->t 
915 0.167 g->a 
14080.500 g=>a 


NODE 51-49 

POS c NUCA 
162 0.429 g->a 
168 0.273 g->a 
655 0.250 t=>g 
684 0.300 g=>a 
732 0.125 a=>g 
836 0.222 g=>c 
11310.333 g=>a 
11670.200 c=>t 
13450.154 a=>t 


AA 
A,P,S 
constant 
constant 
constant 
constant 
A,V 
constant 
constant 
L,M, I 
constant 
constant 
constant 
I,M 
constant 
I,Y 
constant 
Q,E,A 
E,A 


AA 
D,E,Q 
T,V,S,I 
constant 
constant 
Q,E,D 


AA 
constant 
constant 
A,S,T 
constant 


AA 
constant 
V,P,L,T,I 
V,P,L,T,I 
T,V,S,I 
T,V,S,I 
constant 
L,M,I 
constant 
R,K 
E,D,A,K, P,Q 


AA 
constant 
constant 
L,V,C 
S,A 
constant 
S,I,T 
constant 
A.L 
A.S,T,C 


t* 
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NODE 51-50 

pos c NUCA 
57 0.333 t=>g 
84 0.214 c=>a 
284 0.286 a=>g 
561 0.333 a=>g 
714 0.500 a=>g 
11110.286 a=>t 
11400.300 g=>c 
13180.500 g=>c 


NODE 53-52 

POS c NUCA 
108 0.400 c=>t 
290 0.125 a=>t 
297 0.200 t=>c 
357 0.286 c=>t 
673 0.111 c=>a 
682 0.333 g=>t 
771 0.375 c=>t 
807 0.250 c=>t 
855 0.400 a=>g 
12390.500 t=>c 
1410 0.429 a-»c 


NODE 63-62 


POS c NUCA 
138 0.273 c=>t 
279 0.167 g=>a 
435 0.300 c=>t 
456 0.222 c->t 
732 0.125 a->g 
762 0.333 c->g 
861 0.143 c=>t 
1017 0.333 g->a 
1032 0.429 t=>c 
1245 0.200 a->g 
1251 0.273 a->c 
1266 0.429 g->a 
1270 0.500 t->c 
1341 0.333 a->g 
1356 0.143 c->t 
1422 0.429 c->t 


NODE 62-61 


POS c NUCA 
84 0.214 g=>a 
165 0.231 t-»c 
276 0.286 g=>a 
393 0.231 a=>c 
420 0.250 t->c 
672 0.300 t-»a 
684 0.300 g=>a 
762 0.333 g-t 
858 0.167 c=>t 
10050.375 g->t 
10150.500 c=>a 
11110.286 
11130.500 
11670,200 


a-»c 
g=>a 
Cat 


NODE 61-59 

Pos c NUCA 
177 0.300 c-»g 
290 0.125 a=>t 
564 0.214 g=>a 
690 0.429 c=>t 
732 0.125 g-»a 


AA 

D,E 

D,E,Q 
N,D,S,T,E,G 
constant 
R,K 

L,M,T 
constant 
Q,E,A 


AA 

I,T 

Y,F 
A,V,C 
constant 
L,I 

S,A 
constant 
constant 
constant 
constant 
E,D,A,K,P,Q 


AA 

P,L 
constant 
T,V,S,I 
constant 
constant 
constant 
constant 
constant 
constant 
constant 
G,A 
constant 
L,S,V 
constant 
constant 
T,V,L,K 


AA 
D,E,Q 
A.N 
constant 
R,P 

I,V 
constant 
S,A 
constant 
constant 
constant 
constant 
L,M,T 
L,M,T 
A,L 


AA 
R,L,A 
Y,F 

A,V 
A,G,T 
constant 


L.P. Ce 


E 


P 


LSA) 


836 0,222 g->c 
1278 0.500 g=>a 
1401 0.250 c=>t 


NODE 59-58 


Pos c NUCA 
153 0.111 g->a 
177 0.300 g->t 
228 0.125 c=>t 
284 0.286 a=>g 
312 0.182 t=>c 
390 0.667 a=>g 
450 0.214 c->t 
528 0.429 a=>t 
603 0.143 q=>a 
673 0.111 c=>a 
711 0.250 a->g 
885 0.286 c=>t 
927 0.231 a=>q 
982 0,182 t=>g 
1137 0.231 a->g 
1320 0,143 a=>g 
1380 0.200 g->a 


NODE 58-57 


POS c NUCA 
84 0.214 a=>c 
147 0.154 a=>g 
225 0.333 c-»t 
412 0.200 c=>t 
498 0.333 t-»c 
543 0,333 t=>c 
655 0.250 t=>c 
684 0.300 a=>g 
753 0.188 c-»g 
836 0.222 c->g 
11850.200 g=>a 
1224 0.429 a=>g 
1251 0.273 c->t 


NODE 57-56 


POS c NUCA 
60 0.250 c=>t 
117 0.500 a->c 
153 0.111 a->g 
220 0.500 c->g 
267 0.375 c-»a 
378 0.500 g=>a 
384 0.333 a-»g 
424 0.200 c->a 
486 0.167 g=>a 
501 0.333 t=>c 
510 0.167 g=>a 
537 0.429 g=>a 
552 0.200 c=>t 
588 0.400 a=>g 
621 0.250 c=>t 
813 0.231 a=>c 
864 0.333 c=>t 
963 0.182 t->c 
10290.250 g=>a 
10580.500 t=>a 
10710.167 c=>t 
10770.200 t=>c 
11370.231 g->a 
11760.250 g=>a 
12030.200 g=>a 
12090.286 t=>c 


S,I,T L 
A,N - 
constant ka 


AA 
constant = 
R,L,A - 
N,S = 
N,D,S,T,E,G L 
P,F = 
constant Se 
Constant < 
constant e 
constant S 
L,I L 
constant = 
constant = 
I,M L 
A,S,T L 
constant — 
Q,E,A = 
E,A - 


AA 
D,E,Q L 
constant * 
constant = 
S,L - 
constant = 
constant md 
L,V,C - 
S,A 
L,M,I 
S,I,T 
constant 
constant 
G,A = 


V MI 


AA 
Constant - 
constant - 
constant = 
L,V L 
P,T - 
constant - 
constant - 
V,P,L,T,I L 
L,S - 
constant - 
constant - 
constant = 
constant - 
constant É 
constant = 
constant - 
constant = 
c,s - 
constant = 
Y,F,C,L L 
constant > 
constant = 
constant = 
E,D - 
constant - 
constant - 


12450.200 
13450.154 
13460,125 
13470.200 
1362 0.429 
1408 0.500 
14090.250 


g->c 
a=>g 
g=>c 
c=>t 
a=>g 
g=>c 
a=>c 


NODE 56-55 


pos c NUCA 
117 0.500 c->g 
165 0.231 c->a 
168 0.273 g=>a 
393 0.231 c=>t 
763 0.333 g=>a 
13350.167 c=>t 


NODE 61-60 


POS c NUCA 
225 0.333 c=>t 
543 0.333 t-»c 
741 0.111 c=>t 
753 0.188 c->a 
813 0.231 a->g 
10261.000 t=>c 
12690.600 t=>c 
13450.154 a=>t 


NODE 69-68 
POS c NUCA 
18 0.333 g->a 
30 0.250 t-»c 
60 0.250 c->t 
81 0.333 t->a 
93 0.400 c=>t 
ae 0.333 a=>g 
99 0.600 t-»c 
147 0.154 a=>g 
186 0.222 c->t 
213 0.250 c=>t 
282 0.429 a=>c 
306 0.375 t=>a 
351 0.167 c=>t 
366 0.667 t-»g 
372 0.250 a->t 
393 0.231 a->c 
402 0.500 t=>c 
424 0.200 c-»a 
433 0.250 t=>a 
434 0.250 c=>t 
435 0.300 c->a 
441 0.286 t->a 
444 0.167 c->t 
495 0.167 t-»c 
498 0.333 t->a 
528 0.429 a=>t 
538 0.400 t=>c 
603 0.143 g->a 
651 0.500 t->c 
654 0.333 c->t 
655 0.250 t=>g 
657 1.000 a=>t 
666 0.500 a-»g 
684 0.300 t-»a 
720 0.200 g->a 
732 0.125 a->g 
753 0.188 g->a 
168 0.167 t-»c 


constant 
A,S,T,C 
A,S,T,C 
A,S,T,C 

E,D 
E,D,A,K,P,Q 
E,D,A,K,P,Q 


AA 
constant 
A,W 
constant 
R,P 
A,Q,E,V,H,1 
constant 


AA 
constant 
constant 
5:0 
L,M,I 
constant 
constant 
constant 
A,S,T,C 


AA 

- excluded, 
constant 
constant 
constant 
T,P,V 
K,L,S 
D,A,E 
constant 
constant 
constant 
D,E,K,T 
A,V 
constant 
constant 
constant 
R,P 
constant 
V, P.L, T, I 
T,V,S,I 
T,V,S,1 
T,V,S,I 
constant 
F,C 
constant 
constant 
constant 
L,I 
constant 
constant 
constant 
L,V.C 
L,V,C 
constant 
S.A 
constant 
constant 
L,M, I 
CR 


VIP 4 


Pa Sa 


sc 


ES 
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771 0.375 t=>a constant - 10320.429 a-»c constant - 
792 0.500 c->t I,S - NODE 76-75 10350.250 t-»c constant - 
804 0.250 c->t constant * POS c NUCA AA sc 10861.000 t=>c I,V,L = 
834 0.600 t->c T,M = 28 0.500 a->g - excluded, 11010.250 c->t constant = 
836 0.222 g=>c S,I,T L 61 1.000 a=>c R,K,T = 11160.222 t=>g P,A = 
858 0.167 c=>t constant ka 66 0.167 a=>g L,I - 1182 0.250 t-»c constant - 
865 0.167 c->t L,P - 90 0.250 g=>a E,K,Q,T e 12870.200 a->g Q,E,K - 
942 0.200 a=>g L,S - 189 0.375 t-»c constant - 13500.250 a-»g constant m 
957 0.400 t->a R,C = 195 0.750 a-»c constant - 13650.333 a->g constant S 
981 0.143 c->t constant * 207 0.600 t=>a constant š 13951.000 c=>t constant - 
10380.500 t=>g constant = 213 0.250 c-»a constant —  14010.250 t=>c constant - 
11280.222 c->t constant zg 345 0.333 c-»t constant - 
11921.000 t-»g S,F,A L 375 0.333 t-»c F,S - NODE 74-73 
12180.200 t-»c constant E 403 0.333 c=>t constant - POS c NUCA AA sc 
12270.286 c-»t H,Q 7 415 0.250 c-»a constant - 69 0.500 t-»c N,T - 
12450.200 a-»c constant 7 445 0.500 c-»a Q,M,I,T,L,W L 88 0.143 g->a E,K,Q,T L 
12510.273 a-»c G,A 7 446 0.500 a-»t Q,M,I,T,L,W NL 120 0.167 c->t constant - 
13230.333 t-»g constant - 459 0.250 t=>c constant - 153 0.111 a=>g constant - 
13320.500 t=>a I,V 7 600 0.333 t-»c constant - 201 0.250 t-»c constant - 
14100,429 a-»g E,D,A,K,P,Q - 612 0.111 a-»g constant - 213 0.250 a-»c constant - 
14140.333 a->g T,A,S,E,P  L 677 0.167 a=>t Y,N,F L 225 0.333 t=>c constant - 
684 0.300 t->c S,A - 243 0.200 a-»g constant - 
NODE 68-67 693 0.167 a-»g constant = 276 0.286 c->g constant - 
POS c NUCA AA SC 723 0.250 t->c constant - 306 0.375 t->a A,V - 
33 0.500 t-»a V,S,F,D,A L 764 0.400 a=>t A,Q,E,V,H,I NL 327 0.167 a->g constant - 
40 1.000 a=>c KO L 765 0.429 a=>c A,Q,E,V,H,I L 345 0.333 t->c constant - 
Bl 0.333 a-»g constant = 808 0.167 t->c constant - 351 0.167 t=>c constant - 
150 0.231 c=>t A,P,S - 845 0.500 a->t H,Y,S,F L 387 0.333 t-»a constant = 
207 0.600 t-»g constant = 897 0.667 a=>t A,V = 390 0.667 a=>g constant - 
255 0.200 t-»c Y.C - 906 0.286 c->t D,R - 391 0.333 c=>a R,P = 
259 1.000 a-»c I,L L 996 0.400 a-»g constant - 397 0.333 t->c L,S,I L 
261 0.167 c->t I,L =  10320.429 t-»a constant - 405 0.222 a-»g constant = 
339 0,500 t=>g constant - 10800.333 t-»c constant - 423 0.500 t->c constant - 
369 0.250 t-»c constant -  11220.400 t=>a constant - 438 0.333 a-»g K,O = 
387 0.333 c=>t constant = 11230.250 t-»c L,S,F,I,M NL 447 0.167 a=>g Q,M,I,T,L,W L 
393 0.231 c-»g R,P - 11400.300 a-»t constant - 528 0.429 a-»c constant - 
397 0.333 c-ót L,S,I - 12360.333 a-»t constant - 678 1.000 t-»c Y,N,F - 
427 0.333 g=>t A,S - 12750.333 a-»g constant = 693 0.167 g->a constant = 
$50 0.214 t->a constant = 708 0.200 a=>g constant - 
459 0.250 t=>c constant - NODE 75-74 138 0.667 g->t T,N - 
513 0.750 a->g constant = pos c NUCA AA sc 772 0.500 a-»c R,K = 
543 0.333 t->a constant - 31 0.500 g->t V,S,F,D,A pNL 808 0.167 c->t constant - 
564 0.214 a-»c A,V - 96 0.333 a-»g K,L,S 8 846 0.333 t->c H,Y,S,F = 
618 0.333 a-»c constant - 108 0.400 t-»c I,T ° 942 0.200 a=>g L,S = 
$33 0.200 t-»c constant 7 109 0.333 t-»c constant -  10230.231 a->c V,I,L,M e 
NEE EE E ege, 
117 YR pie constant 7 195 0.750 c-»g constant - : E froe n 
` =>t constant - 228 0.125 t-»c N,S - 10650.333 a=>g constan 
* 0.250 a-»g constant =- 261 0.167 t=>c LL USOT TREO n 
e * E -> nstant - 
H cm P hae EE tee - 
930 o 111 c=> ME n adde roh —— x 1179 0. 400 t=>c constant - 
on Ken c=>t constant = 339 0.500 t->c constant = —— acis EE B 
984 Kä SCH Constant = 444 0.167 t-»c F,C e — L200 o S E 
— Ss A,S,T = 453 0.273 a->g constant = ven: I SÉ EPA = 
13226 a0 X uS constant E 522 0.286 t-»c constant = AT — Ë 
c =>c constant 7 618 0.333 a-»c constant - s * 
11760.250 g->a E,D z x -  12150.333 g-»a constant ü 
11790.400 c=>t co UTE 13590.286 t-»c P,A,L - 
12510.273 M constant - 780 0.143 a->g constant - Kee aog REI bs 
1287 0.200 AD. Ech > 786 0.500 t->a V,M,A = i 
13020.286 a- g E,Q,K > 789 0.429 t=>c constant — — 
— SH Constant S 795 0.250 t->c V,G = SEE RA nye 
1338 0.333 Wë Q,E,À - 801 0.500 t-»c constant - POS c iK MNT ad 
1374 1.000 © constant - 819 0.250 t-»a constant - 31 0.500 t-»g V, Et x S 
13850 50 t->g constant + 914 0.143 a->g K,R L 81 0.333 t=>c constan E 
14110.¢ v - 927 0.231 t->g LN L 117 0.500 a->c constant 
Riis dee a->t T,A,S,E,P L 954 0.286 a=>g S,L - 132 0.286 t=>c constant 
TT a->g T,A,S,E,P  - 976 0.250 a->g constant - 144 0.333 a=>g constant 
, g-»a T,V,L,K 7  10080.500 a-»g constant - 148 0.333 c=>g A,P,S L 


1425 0,429 a-»g 


L,V,C,I 
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168 0.273 a=>g constant - 
204 0.375 t=>c constant - 
258 0.333 t=>a G,D,E,N,H L 
291 0.333 t=>c Y,F - 
306 0.375 a-»g A,V - 
313 0.250 t=>c constant - 
315 0.167 g-»a constant - 


321 0.333 a-»c constant - 
402 0.500 t-»g constant - 
465 0.333 c=>t constant - 


471 0,500 t=>c V,A - 
4771 0.143 a-»g constant - 
504 0.167 t=>c constant - 
534 0.200 a=>g constant 
537 0.429 t=>a constant - 
546 0.250 t=>c constant - 
552 0.200 c=>t constant = 
577 1.000 c=>t constant ba 
579 0.375 t=>a constant ka 
588 0.400 a=>g constant - 
591 0.667 t=>c constant * 
597 0.167 t=>c constant = 
603 0,143 a=>g constant - 
612 0.111 g-»a constant - 
618 0.333 c-»a constant - 
663 0.500 a=>g V,C - 
696 0.286 a-»g constant - 
702 0.200 a-»g constant - 
729 0.250 t-»c constant - 
732 0.125 a=>g constant 
744 0.250 a=>g constant 
753 0.188 g-»a L,M,I 
765 0.429 c->a C.F 

785 0.200 c-»t V,M,A 
804 0.250 c=>t constant 
837 0.300 t-»c S,I,T 
840 0.167 g-»a L,S 

849 0.250 t-»c constant E 
852 0.286 t=>c constant e 
870 0.600 t-»c constant = 
876 0.143 t=>c constant - 
914 0.143 g->a K,R 

945 0.750 t=>c constant = 
969 0.429 a=>g constant - 
976 0.250 g->a constant - 
10210.333 g=>a V,I,L,M L 
1042 0.167 t=>c L,S = 
10530.200 t=>c constant = 
1095 0.500 t=>c constant - 
11940.250 t=>c S,F,A = 
11980.167 t=>c L,S 
12000,600 a-»c L,S - 
1230 0.661 t-»a constant 
13450.154 a-»g A,S,T,C 
13460.125 g=>c A,S,T,C 
13620.429 a-»c E,D 
13650.333 g-»a constant 
13800,200 a-»g E,A 
13920.143 a-»g constant 


yum 


1, EE ey 


H 


APPENDIX It (pp. 562-567; corrections in proof, p. 
566). Inferred amino acid changes on the internal 
branches of a string-based cladogram (one of 165 equally 
most-parsimonious), including summary statistics of the 


string search and the resultant matrix of apomorphic 
recognitions, 


Similar to Appendix I, the following table and accom. 
panying reference cladogram contain information about 
the functional impact of specific string changes (as re- 
flected by alterations in amino acid identity). Interpre- 
tation is as in Appendix Í with the following exceptions: 
(i) relative branch length (changes per given branch di- 
vided by total steps) is given, (ü) "CHAR" indicates the 
string character number from the matrix at the end of 
this appendix, (iii) "POS." still refers to nucleotide posi- 
tion, but, here, to the starting (3') position of a string 
recognition, (iv) "STR., SEQ." indicates first the number 
of simulated. nucleotides (i.e., string length) followed by 
the string itself (divided to show the codon positions of its 
component nucleotides), and (v) "AA-seq." shows each 
alternative amino acid sequence identified by a particular 
string recognition. Under the latter category, internal stop 
codons are indicated by *1, *2, or *3 (for TAA, TAG, 
and TGA, respectively), and missing nucleotide data have 
sometimes necessitated the indication (by 771 of missing 
amino acids. Again, Dayhoff et al. (1978) PAM-250 log- 
odds calculations were determined nondirectionally for 
each combination of amino acid sequences. 

Summary statistics from the string search (involving 
1000 randomly generated strings ranging in length from 
6 to 21 base pairs) are provided below. 


Total Mean 
pos recog 
Total Total Total Total tional nitions 

String recog- apomor- similar- single- recog: per 
length nitions phies ities tons nitions strmg 
6. .758 129 77 52 47 2.745 
7 204 43 20 23 31 1.387 
8 107 14 10 4 13 1.077 
9 5 2 1 1 2 1.000 
10 4 2 1 1 2 1.000 
12 21 1 1 0 1 1.000 
14 5 l l 0 1 1.000 
15 8 1 1 0 1 1.000 


= iis” aas ite" SLR 


The 1000 strings evaluated contained the following 
proportions of "nucleotides," which verify their r. 
generation: 


ZA-33175 
> C = 3309 
> G = 3349 
> T = 3297 


The matrix of 193 string recognitions (including I 
potentially informative similarities) 1s also as be 
Headers are provided to give additional informa * 
each character. The number of nucleotides per ` 
character is given, followed by the number of r (in terme 
(hits) per string, the start position of the string U xen, 
of rbcL nucleotides), and the character — 
erence to the table of changes). Immediately f —— 
start position information may appear o 
“ab”; this indicates that separate string recognitions Io 
the same start position, and so showed our prese 
(such partial correlation has been ignored in Gei pre: 
analyses; see text for further details). The matrix ® ii 
sented in two blocks, corresponding to two ro f 1000): ia 
evaluation (500 strings in each, for a total —— 
each case, string recognitions occurring in the during 
region are shown in brackets, but were H 

parsimony analysis. 
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NODE 77 - 76, relative branch length = 0.0138 
CHARPOS. STR.,SEQ. c AA-seq. 

032 313b 7, tta gat t 0.500 LDL 

091 1254 6, t gct aa 1.000 VAN, AAN 

100 1344 6, t gct aa 0.200 AAK, ASK, ACK, ATK, RTK 


136 465 6, t caa gt 0.250 IQV 
142 607 6, gat gaa 0.125 DE 
172 980 7, ac gct gg 0.100 HAG, HSG, HTG 


173 1017 6, t caa gt 0.500 RQV, RDV, REV, REI, RQI, RDL 


NODE 76 - 75, relative branch length = 0.0138 
CHARPOS. STR.,SEQ. c AA-seq. 


033 326 6, aa gaa g 0.125 EEG 
043 487 6, aac aaa 0.143 NK 
113 111 6, a gca ge 0.200 LAA 
121 235 6, cgt tac 0.333 RY 


137 728 6, ct gca g 0.167 TAG, TSG 


152 750 6, g atg aa 0.100 MMK, MLK, MIK 
183 1231 6, tgg gga 0.200 WG 

NODE 75 - 74, relative branch length = 0.0138 
CHARPOS. STR.,SEQ. C AA-seq. 


0.200 NMF, NLF 
0.200 AAK, ASK, ACK, ATK, RTK 
0.250 GAA, GWA 


035 345 6, c atg tt 
100 1344 6, t gct aa 
116 162 6, a gca gc 


150 724 6, gct act 


0.167 NTS, NMI, NTT 
0.143 NRV, N*3V 


158 830 B, at act agt 
166 1259 7, at cga gt 


NODE 74 - 73, relative branch length = 0.0079 
CHARPOS. STR.,SEQ. c AA-seq. 


018 152 6, aa gaa g 0.143 EEA 
031 313 6, tta gat 0.500 LD 
043 487 6, aac aaa 0.143 NK 
092 1259 7, at cga gt 0.167 NRV 


NODE 73 - 72, relative branch length = 0.0079 
CHARPOS. STR.,SEQ. C AA-seq. 
034 333 8, t tct gtt a 0.167 GSVT 
085 1147 6, cat gtt 0.143 HV 

138 500 7, gt cct tt 0.250 RPL 
183 1231 6, tgg gga 0.200 WG 


NODE 72 - 42, relative branch ka = 0.0079 
CHARPOS. STR.,SEQ. AA-seq. 

033 326 6, aa qaa g 0. 125 EEG 

115 155 7, aa gca gg 0.167 EAG 

137 728 6, ct gca g 0.167 TAG, TSG 

187 1284 6, a cag gc 0.250 VQA, VEA, VKA 
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NODE 42 - 41, relative branch length = 0.0178 


CHARPOS. STR.,SEQ. c AA-seq. sc 
044 543 6, t gct aa 0.143 SAK - 
052 728 6, ct gca g 0.111 TAG, TSG L 
068 939 7, a ttg gcc 1.000 VLA, VSA NL 
077 1093 8, acc caa ga 0.250 TQD, PQD L 
100 1344 6, t gct aa 0.200 AAK, ASK, ACK, ATK, RTK PNL 
113 111 6, a gea gc 0.200 LAA = 
136 465 6, t caa gt 0.250 IQV = 
142 607 6, gat gaa 0.125 DE T 
152 750 6, g atg aa 0.100 MMK, MLK, MIK L 
NODE 72 - 71, relative branch length = 0.0079 
CHARPOS. STR.,SEQ. c AA-seq. sc 
026 266 6, ct gtt g 0.167 PVA, PVP, PVV, TVT, SVV PNL 
053 755 7, aa aga gc 0.200 KRA = 
114 126 6, g act cc 0.500 MTP, VTP, LTP, VSP L 
193 1394 6, tc aag t 0.500 IKF, IRF, IIF PNL 
NODE 71 - 70, relative branch length - 0.0138 
CHARPOS. STR.,SEQ. c AA-seq. sc 
034 333 8, t tct gtt a 0.167 GSVT - 
044 543 6, t gct aa 0.143 SAK - 
052 728 6, ct gca g 0.111 TAG, TSG L 
092 1259 7, at cga gt 0.167 NRV - 
142 607 6, gat gaa 0.125 DE = 
152 750 6, g atg aa 0.100 MMK, MLK, MIK L 
NODE 70 - 43, relative branch length = 0.0099 
CHARPOS. STR.,SEQ. c AA-seq. sc 
006 88 6, aag acc 0.500 ET , EP, KV, DT, QT, TP PNL 
007 90 6, g acc aa 0.200 ETK, EPK, KVS, KTK, DTK, QTK, ETL, 
TPK, PNL 
114 126 6, g act cc 0.500 MTP, VTP, LTP, VSP L 
138 500 7, gt cct tt 0.250 RPL * 
158 830 8, at act agt 0.167 NTS, NMI, NIT PNL 
NODE 70 - 69, relative branch length = 0.0039 
CHARPOS. STR.,SEQ. c AA-seq. sc 
033 326 6, aa gaa g 0.125 EEG - 
183 1231 6, tgg gga 0.200 WG - 
NODE 69 - 66, relative branch length = 0.0138 
CHARPOS. STR.,SEQ. c AA-seq. sc 
013 141 6, a gtt cc 0.333 GVP - 
036 388 6, cta cga 0.200 LR, LP L 
043 487 6, aac aaa 0.143 NK - 
056 783 6, a gtt cc 0.250 GVP, GMP, GAP PNL 
124 273 6, t ggg ga 0.167 AGE, PGE, VGE, TGE PNL 
177 1182 6, t ggg ga 0.333 FGD - 
193 1394 6, tc aag t 0.500 IKF, IRF, IIF PNL 
NODE 66 - 65, relative branch length - 0.0079 
CHARPOS. `. -seq. 
010 123 12, a gta act cct ca 0. 200 RVTPQ, RMTPQ, RLTPQ, RVSPQ L 
076 1067 6, aa gac c 0.200 KFR, EDR NL 
132 395 6, ct cta c 0.143 ALR, ASR, T?? NL 


150 


724 6, 


gct act 0.125 AT 
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NODE 65 - 51, relative branch length = 0.0059 
CHARPOS. STR., SEQ. c AA-seq. 
085 1147 6, cat gtt 0.143 HV 

152 750 6, g atg aa 


0.100 MMK, MLK, MIK 
166 1259 7, at cga gt 0.143 NRV, N*V 


NODE 51 - 50, relative branch length » 0.0059 

CHARPOS. STR.,SEQ. c AA-seq. 

124 273 6, t ggg ga 0.167 AGE, PGE, VGE, TGE 
130 386 6, ct cta c 0.167 ALR, ALP 

142 607 6, gat gaa 0.125 DE 


NODE 50 - 49, relative branch length = 0.0118 


CHARPOS. STR.,SEQ. c AA-seq. 

013 141 6, a gtt cc 0.333 GVP 

035 345 6, c atg tt 0.200 NMF, NLF 

047 635 6, tg cgt t 0.333 MRW 

051 684 6, t cag gc 0.250 AQA, SQA, AQT, SQG 
053 755 7, aa aga qc 0.200 KRA 


143 63915, c tgg aga gat cgt tt0.500 RWRDRF 
NODE 49 - 48, relative branch length = 0.0158 


CHARPOS. STR.,SEQ. c AA-seq. 

050 663 6, t gca ga 0.500 CAE, VAE, CAE 
052 728 6, ct gea q 0.111 TAG, TSG 

085 1147 6, cat gtt 0.143 HV 

115 155 7, aa goa gg 0.167 EAG 

119 199 6, acc act 0.200 TT 

130 386 6, ct cta c 0.167 ALR, ALP 

137 2728 6, ct gca g 0.167 TAG, TSG 

166 1259 7, at cga gt 0.143 NRV, N*V 
NODE 49 - 44, relative branch length = 0.0138 
CHARPOS. STR.,SEQ. c AA-seq. 

031 313 6, tta gat 0.500 LD 

038 412 6, cta cga 0.333 LR, SR 

116 162 6, a gca gc 0.250 GAA, GWA 

138 500 7, gt cet tt 0.250 RPL 

142 607 6, gat gaa 0.125 DE 

172 980 7, ac gct gg 0.100 HAG, HSG, HTG 


NODE 48 - 47, relative branch length = 0.0079 

CHARPOS. STR.,SEQ. c Ak-seq. 

017 164 6, ct gca g 0.167 AAV, WAV 

053 755 7, aa aga gc 0.200 KRA 

054 766 7, ttt gcc a 0.250 FAR, CAR, CAK 

124 273 6, t ggg ga 0.167 AGE, PGE, VGE, TGE 
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NODE 47 - 46, relative branch length = 0.0158 


L 


NL 
NL 


CHARPOS. STR.,SEQ. e AA-seq. sc 
002 54 14, a gat tac aga tta 0.200KDYRL, RDYRL, KDYKL, KDYTI, KEYKL PNL 
024 227 6, gt ctc g 0.250 SLD, NLD L 
035 345 6, c atg tt 0.200 NMF, NLF 

043 487 6, aac aaa 0.143 NK 

049 655 6, tgc ttc 1.000 CF, LF, VF 

076 1067 6, aa gac c 0.200 KFR, EDR 

092 1259 7, at cga gt 0.167 NRV 

152 750 6, g atg aa 0.100 MMK, MLK, MIK 


NODE 46 - 45, relative branch length - 0.0059 
CHARPOS. STR.,SEQ. C  AÀ-seq. 


012 140 7, gg gtg cc 0.333 GVP 
132 2395 6, ct cta c 0.143 ALR, ASR, T?? 
1821207b 6, ggc ggg 0.500 GG 


NODE 65 - 64, relative branch length = 0.0079 
CHARPOS. STR.,SEQ. c AA-seq. 

052 728 6, ct gca g 0.111 TAG, TSG 

142 607 6, gat gaa 0.125 DE 

168 2950 6, cg tta c 0.250 ALR, ASC 

172 980 7, ac gct gg 0.100 HAG, HSG, HTG 


NODE 64 - 55, relative branch length = 0.0059 

CHARPOS. STR.,SEQ. C AA-seq. 

061 856 6, gac aac 0.200 DN 

106 1418 6, at acc t 0.500 DTL, DVL, DTV, ILC, 
191 1355 8, gc cct gaa 0.500 SPE, SPD, SAE, SLE 


NODE 55 - 54, relative branch length = 0.0039 
CHARPOS. STR.,SEQ. c AA-seq. 
054 766 7, ttt gcc a 0.250 FAR, CAR, CAK 
152 750 6, g atg aa 0.100 MMK, MLK, MIK 


NODE 54 - 52, relative branch length = 0.0020 
CHARPOS. STR.,SEQ. c AA-seq. 


021 198 6, g aca ac 0.250 WIT 

NODE 54 - 53, relative branch length = 0.0059 

CHARPOS. STR.,SEQ. C AA-seq. 

073 1135 6, tca ggc 0.500 SG 

079 1110 7, t ttg cca 1.000 SLP, STP, SLA, SMP 
176 1138 6, ggc ggt 0.500 GG 


NODE 64 - 63, relative branch length = 0.0020 
CHARPOS. STR.,SEQ. 
192 1369 7, gct gct t 


c A- ` 
0.167 AAC 
NODE 63 - 62, relative branch length = 0.0079 
CHARPOS. STR.,SEQ. c AA-seq. 
003 74 9, at acg cct g 
054 766 7, ttt gcc a 
092 1259 7, at cga gt 
124 273 6, t ggg ga 


0.250 FAR, CAR, CAK 
0.167 NRV 
0.167 AGE, PGE, VGE, TGE 
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NODE 62 - 61, relative branch length = 0.0039 


CHARPOS. STR.,SEQ. c AA-seq. 
098 1338 6, t gag gc 0.333 REA, EEA 
175 1109 6, ct ctac 0.333 SLP, STP, 
NODE 61 - 56, relative branch length = 0.0039 
CHARPOS. STR.,SEQ. c AA-seq. 
015 146 8, ca eer gag 0.100 PPE, PAE, 
145 684 6, a cag gc 0.333 AQA, SQA, 
NODE 60 - 57, relative branch length = 0.0039 
CHARPOS. STR.,SEQ. c AA-seq. 
018 152 6, aa gaa g 0.143 EEA 

090 1245 7, g ggt gcc 0.500 PGA, PGG, 
NODE 60 - 59, relative branch length = 0.0099 
CHARPOS. STR.,SEQ. c AA-seq. 
076 1067 6, aa gac c 0.200 KFR, EDR 
100 1344 6, t GCT aa 0.200 AAK, ASK, 
137 728 6, ct gca g 0.167 TAG, TSG 
152 750 6, g atg aa 0.100 MMK, MLK, 
177 1182 6, t ggg ga 0.333 FGD 


Corrections in proof: 


— 


SLA, 


PSE 


SQG, 


PVA, 


ACK, 
MIK 


NODE 59 - 58, relative branch — = 0.0039 


CHARPOS. STR.,SEQ. AA-seq. 

139 501 6, c ccc ct 1. 000 RPL 

146 686 9, ag gct gaa a 0.333 QAET, QGET, QTET 
NODE 69 - 68, relative branch — = 0.0079 

CHARPOS. STR.,SEQ. AA-seq. 

027 267 6, t gtt cc 0.250 PVP, PLP, PVV, TVT, SVV, PVA 
029 543 6, t gct aa 1.000 SAK 

137 728 6, ct gca g 0.167 TAG, TSG 

166 1259 7, at cga gt 0.143 NRV, N*V 

NODE 68 - 67, relative branch — = 0.0197 

CHARPOS. STR.,SEQ. AA-seq. 

025 252 6, c tac ga 0. 250 CYD, CYG, CYE, CYN, CYH 
034 333 8, t tct gtt a 0.167 GSVT 

047 635 6, tg cgt t 0.333 MRW 

076 1067 6, aa gac c 0.200 KFR, EDR 

092 1259 7, at cga gt 0.167 NRV 

116 162 6, a gca gc 0.250 GAA, GWA 

130 386 6, ct cta c 0.167 ALR, ALP 

150 724 6, gct act 0.125 AT 

170 966 6, t ggg ga 0.333 GGD 

192 1369 7, gct get t 0.167 AAC 


P. 564, under"NopE 70-43," fourth line from bottom, delete comma after “TPK” and 
move “PNL” to right hand column; p. 566, under “NODE 68-67," bottom line, right hand column, should read 
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